motivated specialized hardware approaches in the past. 14, 16
Lastly, wireless PHY and media access control (MAC) protocols have low-latency real-time deadlines that must be met
for correct operation. For example, the 802.11 MAC protocol
requires precise timing control and ACK response latency on
the order of tens of microseconds. Existing software architectures on the PC cannot consistently meet this timing
requirement.
Sora addresses these challenges with novel hardware
and software designs. First, we have developed a new, inexpensive radio control board (RCB) with a radio front-end
for transmission and reception. The RCB bridges an RF
front-end with PC memory over the high-speed and low-latency PCIe bus. With this bus standard, the RCB can support 16.7Gbps (× 8 mode) throughput with sub-microsecond
latency, which together satisfies the throughput and timing
requirements of modern wireless protocols while performing all digital signal processing on host CPU and memory.
Second, to meet PHY processing requirements, Sora
makes full use of various features of widely adopted multi-core architectures in existing GPPs. The Sora software
architecture explicitly supports streamlined processing
that enables components of the signal processing pipeline
to efficiently span multiple cores. Further, we change the
conventional implementation of PHY components to extensively take advantage of lookup tables (LUTs), trading off
computation for memory. These LUTs substantially reduce
the computational requirements of PHY processing, while
at the same time taking advantage of the large, low-latency
caches on modern GPPs. Finally, Sora uses the Single
Instruction Multiple Data (SIMD) extensions in existing processors to further accelerate PHY processing.
Lastly, to meet the real-time requirements of high-speed
wireless protocols, Sora provides a new kernel service, core
dedication, which allocates processor cores exclusively for
real-time SDR tasks. We demonstrate that it is a simple
yet crucial abstraction that guarantees the computational
resources and precise timing control necessary for SDR on
a multi-core GPP.
We have developed a few demonstration wireless systems based on the Sora platform, including: ( 1) SoftWiFi,
an 802.11a/b/g implementation that supports a full suite
of modulation rates (up to 54Mbps) and seamlessly interoperates with commercial 802.11 NICs, and ( 2) SoftLTE,
a 3GPP LTE uplink PHY implementation that supports up to
43.8Mbps data rate.
The rest of the paper is organized as follows. Section 2
provides background on wireless communication systems.
We then present the Sora architecture in Section 3, and we
discuss our approach for addressing the challenges of building
an SDR platform on a GPP system in Section 4. We then
describe the implementation of the Sora platform in Section 5.
Section 6 provides a quantitative evaluation of the radio
systems based on Sora. Finally, Section 7 describes related
work and Section 8 concludes.
2. BackGRounD anD ReQuiRements
In this section, we briefly review the PHY and MAC components of typical wireless communication systems. Although
different wireless technologies may have subtle differences
among one another, they generally follow similar designs
and share many common algorithms. In this section, we use
the IEEE 802.11a/b/g standards to exemplify characteristics
of wireless PHY and MAC components as well as the challenges of implementing them in software.
2. 1. Wireless Ph Y
The role of the PHY layer is to convert information bits into
a radio waveform, or vice versa. At the transmitter side, the
wireless PHY component first modulates the message (i.e., a
MAC frame) into a time sequence of digital baseband signals.
Digital baseband signals are then passed to the radio front-end, where they are converted to analog waveform, multiplied
by a high frequency carrier and transmitted into the wireless
channel. At the receiver side, the radio front-end receives
radio signals in the channel and extracts the baseband waveform by removing the high-frequency carrier. The extracted
baseband waveform is digitalized and converted back into
digital signals. Then, the digital baseband signals are fed into
the receiver’s PHY layer to be demodulated into the original
message.
The PHY layer directly operates on the digital baseband signals after modulation on the transmitter side and
before demodulation on the receiver side. Therefore, high-throughput interfaces are needed to connect the PHY layer
and the radio front-end. The required throughput linearly
scales with the bandwidth of the baseband signal as well as
the number of antennas in a MIMO system. For example, the
channel width is 20MHz in 802.11a. It requires a data rate of
at least 20M complex samples per second to represent the
waveform. These complex samples normally require 16-bit
quantization for both in-phase and quadrature (I/Q) components to provide sufficient fidelity, translating into 32 bits
per sample, or 640Mbps for the full 20MHz channel. Oversampling, a technique widely used for better performance, 11
doubles the requirement to 1.28Gbps. With a 4 × 4 MIMO
and 40-MHz channel, as specified in 802.11n, it will again
quadruple the requirement to 10Gbps to move data between
the RF frond-end and PHY for one channel.
Advanced communication systems (e.g., IEEE 802.11a/b/g,
as shown in Figure 1) contain multiple functional blocks in
their PHY components. These functional blocks are pipelined with one another. Data are streamed through these
blocks sequentially, but with different data types and sizes.
As illustrated in Figure 1, different blocks may consume or
produce different types of data in different rates arranged
in small data blocks. For example, in 802.11b, the scrambler may consume and produce one bit, while DQPSK
modulation maps each two-bit data block onto a complex
symbol, whose real and image components represent I and
Q, respectively.
Each PHY block performs a fixed amount of computation
on every transmitted or received bit. When the data rate is
high, e.g., 11Mbps for 802.11b and 54Mbps for 802.11a/g,
PHY processing blocks consume a significant amount of
computational power. Based on the model in Neel et al., 16
we estimate that a direct implementation of 802.11b may
require 10GOPS while 802.11a/g needs at least 40GOPs.