The four papers presented here
provide an overview of how FPGAs are
being integrated into datacenters and
how they are being used to make data
processing more efficient. They are presented in two groups, one showing how
designs in this area are quickly evolving
and one detailing some of the ongoing
debates around FPGAs.
FPGAs by Design
A. Putnam, A.M. Caulfield, E.S. Chung, et al.
A reconfigurable fabric for accelerating
large-scale datacenter services. In Proceedings
of the 41st ACM/IEEE International
Symposium on Computer Architecture, 2014;
A.M. Caulfield, E.S. Chung, A. Putnam, et al.
A cloud-scale acceleration architecture.
In Proceedings of the 49th IEEE/ACM
International Symposium on Microarchitecture,
These two papers are part of a series of
publications by Microsoft describing
Project Catapult (
https://www.micro-soft.com/en-us/research/project/proj-ect-catapult/). The first paper provides
insights into the development process
of FPGA base systems. The target application is accelerating the Bing Web
search engine. The configuration involves one FPGA per server, connected
to the host through peripheral component interconnect (PCI). A separate
network, independent of the conventional network, connects the FPGAs
to each other using a six-by-eight, two-dimensional torus topology. The paper
shows how such a system can improve
the throughput of document ranking
or reduce the tail latency for such operations by 29%.
The second paper builds on the
lessons learned from the first. The
Web-search accelerator was based on
a unit of 48 machines, a result of the
decision to use a torus network to
connect the FPGAs to each other. Not
only is the cabling of such units cumbersome, but it also limits how many
FPGAs can talk to each other and requires routing to be provided in each
FPGA, complex procedures to achieve
fault tolerance, etc.
In the cloud, scaling and efficiently
using such a design is problematic.
Hence, the second paper describes
the solution being deployed in Azure:
the FPGA is placed between the NIC
(network interface controller) of the
host and the actual network, as well as
having a PCI connection to the host.
All network traffic goes through the
FPGA. The motivation for this is that
the regular 40Gbps network available
in the cloud can also be used to connect the FPGAs to each other without
a limitation on the number of FPGAs
directly connected. With this design,
the FPGA can be used as a coprocessor (linked to the CPU through PCI)
or as a network accelerator (in front of
the NIC), with the new resource being
available through the regular network
and without any of the limitations of
the previous design. The design makes
the FPGA available to applications, as
well as to the cloud infrastructure,
widening the range of potential uses.
FPGAs as Debate
L. Woods, Z. István, and G. Alonso
Ibex—An intelligent storage engine with
support for advanced SQL off-loading. In
Proceedings of the VLBD Endowment 7, 11
I. Jo, D-H Bae, A.S. Yoon, J-U Kang,
S. Cho, D.Dg Lee And J. Jeong
YourSQL: A high-performance database
system leveraging in-storage computing.
In Proceedings of the VLDB Endowment 9, 12
These two papers illustrate an oft-heard debate around FPGAs. If the
functionality provided in the FPGA is
so important, can it not be embedded
in an ASIC or a dedicated component
for even higher performance and power efficiency? The first paper shows
how to extend the database MySQL
with an SSD+FPGA-based storage engine that can be used to offload queries or parts of queries near the storage. The result is much-reduced data
movement from storage to the database engine, in addition to significant
The second paper uses an identical
database scenario and configuration
but replaces the FPGA with the processor already available in the SSD (
solid-state drive) device. Doing so avoids the
data transfer from the SSD to the FPGA,
which is now reduced to reading the
data from storage into the processor
used to manage the SSD.
As these two papers illustrate, the
efficiency advantages derived from
using a specialized processor must
be balanced with the ability to repurpose the accelerator, a discussion
that mirrors the steps taken by Microsoft designers toward refining the
architecture of Catapult to increase
the number of potential use cases.
In a cloud setting, database applications would greatly benefit from an
SSD capable of processing queries.
All other applications, however, cannot do much with it, a typical trade-off
between specialization (that is, performance) and generality (flexibility of
use) common in FPGA designs.
FPGAs are slowly leaving the niche space
they have occupied for decades (for
example, circuit design, customized
acceleration, and network management) and are now becoming processing elements in their own right. This
is a fascinating phase where different
architectures and applications are being tested and deployed. As FPGAs are
redesigned to use the latest technologies, it is reasonable to expect they
will offer larger capacity, higher clock
rates, higher memory bandwidth, and
more functionality, and become available in off-the-shelf configurations
suitable for datacenters. How it all develops will be fascinating to watch in
the coming years.
Gustavo Alonso is a professor of computer science at ETH
Zürich, Switzerland, where he is a member of the Systems
Group ( www.systems.ethz.ch). His recent research includes
multicore architectures, data appliances, cloud computing,
and hardware acceleration, with the main goal of adapting
system software to modern hardware platforms.
Copyright held by owner/author.
Publication rights licensed to ACM. $15.00.