(a) (b)
P-Mesh Off-chip
routers ( 3)
Chip
bridge
P-Mesh XBars
( 3)
DRAM SDHC I/O
+
Directory cache
P-Mesh
routers
( 3)
L1.5 Cache
CCX Arbiter
FPU
Modified
OpenSPARC T1
core
MITTS
(Traffic shaper)
Figure 2. Architecture of (a) a tile and (b) chipset.
option or configuration file. OpenPiton is easy to extend;
the presence of a well documented core, a well documented
coherence protocol, and an easy-to-interface NoC make
adding research features straightforward. Research extensions to OpenPiton that have already been built include
several novel memory system explorations, an Oblivious
RAM controller, and a new in-core thread scheduler. The
validated and mature ISA and software ecosystem support OS and compiler research. The release of OpenPiton’s
scripts for FPGA emulation and chip manufacture make
it easy for others to port to new FPGAs or semiconductor process technologies. In particular, this enables CAD
researchers who need large netlists to evaluate their algorithms at-scale.
2. THE OPENPITON PLATFORM
OpenPiton is a tiled, manycore architecture, as shown in
Figure 1. It is designed to be scalable, both intra-chip and
inter-chip, using the P-Mesh cache coherence system.
Intra-chip, tiles are connected via three P-Mesh networks
on-chip (NoCs) in a scalable 2D mesh topology (by default).
The NoC router address space supports scaling up to 256
tiles in each dimension within a single OpenPiton chip (64K
cores/chip).
For inter-chip communication, the chip bridge extends
the three NoCs off-chip, connecting the tile array (through
the tile in the upper-left) to off-chip logic (chipset). The chipset may be implemented on an FPGA, as a standalone chip,
or integrated into an OpenPiton chip.
The extension of the P-Mesh NoCs off-chip allows the
seamless connection of multiple OpenPiton chips to create a larger system, as shown in Figure 1. OpenPiton’s
cache-coherence extends off-chip as well, enabling shared-memory across multiple chips, for the study of even larger
shared-memory manycore systems.
2. 1. Tile
The architecture of a tile is shown in Figure 2a. A tile consists
of a core, an L1.5 cache, an L2 cache, a floating-point unit (FPU),
a CPU-Cache Crossbar (CCX) arbiter, a Memory Inter-arrival
Time Traffic Shaper (MIT TS), and three P-Mesh NoC routers.
The L2 and L1.5 caches connect directly to all three NoC
routers, and all messages entering and leaving the tile tra-
verse these interfaces. The CCX is the crossbar interface used
user to compose them together, OpenPiton is designed with
all of the components integrated into the same, easy-to-
use, build infrastructure providing push-button scalability.
Researchers can easily deploy OpenPiton’s source code, add
in modifications, and explore their novel research ideas in
the setting of a fully working system. Thousands of targeted,
high-coverage test cases are provided to enable researchers
to innovate with a safety net that ensures functionality is
maintained. OpenPiton’s open source nature also makes it
easy to release modifications and reproduce previous work
for comparison or reuse.
Rather than simply being a platform designed by computer architects for use by computer architects, OpenPiton
enables researchers in other fields including operating systems (OS), security, compilers, runtime tools, systems, and
computer-aided design (CAD) tools to conduct research at-scale. In order to enable such a wide range of applications,
OpenPiton is configurable and extensible. The number of
cores, attached I/O, size of caches, in-core parameters, and
network topology are all configurable from a command-line
Tile
Chip
Figure 1. OpenPiton Architecture. Multiple manycore chips are
connected together with chipset logic and networks to build large
scalable manycore systems. OpenPiton’s cache coherence protocol
extends off chip.
Table 1. Supported OpenPiton configuration options. Bold indicates
default values. (*Associativity reduced to 2-ways at smallest size).
Component Configurability options
Cores (per chip) Up to 65,536
Cores (per system) Up to 500 million
Threads per core 1/2/4
Floating-point unit Present/Absent
Stream-processing unit Present/Absent
TLBs 8/16/32/64 entries
L1 I-cache 8*/16/32KB
L1 D-cache 4*/8/16KB
L1.5 cache Number of sets, ways (8KB, 4-way)
L2 cache (per tile) Number of sets, ways (64KB, 4-way)
Intra-chip topologies 2D mesh, crossbar
Inter-chip topologies 2D mesh, 3D mesh, crossbar, butterfly network
Bootloading SD/SDHC card, UART