consider whether a new language is
necessary at all. Although from a functional point of view “it’s all just computation,” hardware system design is
characterized by features very different
from software design.
Hardware systems typically have parallelism
that is massive, fine-grain, heterogeneous, and reactive. (Unlike some authors, I make no distinction between
the terms concurrency and parallelism).
This parallelism results in increased
speed, which is often the main reason
to implement something in hardware.
Massive in this context means that the
number of parallel activities may number in the thousands or even millions.
Fine-grain means that parallel activities
interact frequently (measured in time
scales of clock cycles) over possibly
very small shared resources (such as
individual registers of a few bits).
Heterogeneous means that parallel activities involve diverse tasks—in contrast,
for example, to SIMD (single instruction, multiple data) or SPMD (single
process, multiple data) computation.
Finally, reactive means that parallel activities are often triggered by asynchronous events, such as unpredictable
data arrival, interrupts, arbitration
resolution, and I/O.
Unfortunately, most software languages are not parallel at all, but instead
are entirely sequential. Some extensions
for massive parallelism address only
SIMD parallelism. Thread extensions
can handle heterogeneity, but they are
figure 1. hardware vs. software design.
(computation model and its costs)
notoriously difficult for massive, fine-grain, or reactive parallelism.
1, 15, 22
Architecture and algorithm. In
hardware system design, good architecture (with its attendant cost model)
is a central design goal and an outcome
of design activity, whereas software is
mostly designed for a fixed, given input
architecture (typically von Neumann,
perhaps with extensions such as
SIMD), as illustrated in Figure 1. Thus,
in hardware design, algorithm and architecture are inseparable, and it is
meaningless to talk about “pure algorithm design” in the abstract, without
some architectural model that gives it
a concrete cost metric.
Since they are all Turing-complete,
any architecture can be modeled or
simulated with existing programming
languages, but there are two fundamental problems with this. First, it
takes extraordinary discipline (and
therefore a lot of time and effort) to
model a complex architecture accurately. Second, modeling an architecture often means losing orders of
magnitude in execution speed, both
because of the extra layer of interpretation and because the natural parallelism of the architecture being modeled
is essentially completely discarded.
Domain-specific languages (DSLs) can
address the first issue, but not the second. For most software programming
languages, the “native” architectural
model is the von Neumann model (one
sequential process, with constant-time
access to a large, flat memory), and it is
(computation model and its costs)
only native von Neumann algorithms
that execute at speed.
The term architectural transparency
expresses the idea that the source program directly reflects the desired architecture. Abstraction mechanisms
can hide detail but should not distort
the resulting architecture. This property is essential for the programmer/
designer, for whom abstraction is
good, provided that it does not compromise predictability and control.
This property is also essential for compilers (synthesis) to produce efficient
the inadequacy of
As mentioned, today’s SoCs need
FPGA-based emulation to achieve acceptable simulation speed, and this
requires universal applicability of HDL
(models, test benches, and implementations of the full gamut of SoC components) and universal synthesizability.
Unfortunately, no software languages
are suitable for this.
Many in the industry advocate using
a combination of C++ and SystemC—
the former to describe the algorithmic content of individual intellectual
property (IP) blocks, and the latter to
describe system-level communication,
hierarchy, and integration of these
IP blocks into an SoC. The C++ inside
IP blocks can then be subjected to
so-called high-level synthesis5 (HLS),
using tools that automatically generate parallel hardware from sequential
C++, given declarative objectives such
as latency, throughput, area, power,
and target silicon technology. A detailed critique would require another
article, but this section and Stephen A.
Edwards’ article on the topic7 provide
Let’s focus first on the computation
model of BSV (that is, its behavioral
semantics), because that is the greatest limitation of existing software languages for hardware design. Then we
present an example to demonstrate the
use of modern structural abstraction
Rules: The basic computation model. Verilog and VHDL, with their origins as simulation languages, are built
on the uniprocessor von Neumann