bad idea for one basic reason: It must
perform moderately complex computation at least as fast as a computer’s
“main” CPU, yet manufacturers insist
it be cheap due to the cost “multiplier”
effect when a computer includes many
network links. To show that solving
this problem is impossible, consider
the following proof by contradiction:
Assume an NFE processor with requisite attributes—moderately general-purpose, fast, and cheap—with most
computer manufacturers using it as
a main processor, not as a lowly NFE
and so in need of even more network
bandwidth.
Many computer engineers have
long understood, as O’Dell wrote, that
the most efficient way to implement a
network-protocol software stack is to
use one or more CPUs of an N-way SMP,
but users strongly resist this idea when
they discover they’ve paid big for what,
from an application point of view, is
only an N-1-way computer; note, too,
popular “low-end” cases in which N = 2 or
4. Apparently, Sun Microsystems (“the
network is the computer”) hasn’t left
much of an impression on IT managers. The result is that NFE startups
continue to waste engineering talent
by trying to pour a quart of technology
into this particular pint jar.
Scott marovich, Palo Alto, CA
how to address Big Data
I want to thank Adam Jacobs for cataloging the important issues in the management of large volumes of data in his
article “The Pathologies of Big Data”
(Aug. 2009). But please know that innovations are also being made by relational database vendors. One recently
released product is the HP Oracle Database Machine ( http://www.oracle.com/
database/ database-machine.html) that
processes large volumes of data using
massive horizontal parallelism across
a shared-nothing storage grid. Pipeline (vertical) parallelism is enabled by
offloading data processing to a storage
grid, so the amount of data that must
be shipped back from the storage grid
to the database grid is reduced as well.
The storage grid and database grid are
connected through a high-speed Infini-band interconnect.
A single DM has an I/O bandwidth
of 14GB/sec in the first version of the
product (with uncompressed data). If
the data is compressed, the effective
bandwidth is much greater, depending on compression ratio. (Jacobs’s experiment would have run much faster
on the DM.) Multiple DMs can be connected to increase data capacity, along
with corresponding network/comput-ing capacity.
Oracle’s Automatic Storage Management (ASM) is an integral part of
DM, providing automatic load balancing across all nodes in the storage
grid. ASM also provides fault-tolerance
through dual or triple mirroring. The
Oracle database provides high-avail-ability and disaster-recovery features,
and ASM enables sequential I/Os
through its allocation strategies (such
as large allocations).
One of the best ways to improve query performance (assuming the most
optimal access method is used) is to
avoid I/O altogether. The Oracle database provides rich partitioning strategies that enable skipping large chunks
of data that do not qualify as a query
scan.
Moving data from production
(OLTP systems) to a specialized data
store adds to a system’s total cost of
ownership, as one would otherwise
be managing two different data stores
with poor or no integration. DM solves
the big-data problem without a special
data store just for data analysis; DM
provides a single view of the data.
Though I am a technical member of
the Oracle Exadata development team,
my aim here is not to plug the product
but report that the big-data problem is
indeed being tackled, particularly by
relational database vendors.
umesh Panchaksharaiah, Richmond, CA
my Generation
Samuel Greengard’s news story “Are
We Losing Our Ability to Think Critically” (July 2009) is inspiring as a basis
for future work. I am coordinating an
interdisciplinary seminar on the collective construction of knowledge (http://
seminario.edusol.info in Spanish), including two topics Greengard might be
able to bridge: One is free software in
a democratic society, inciting people to
be more politically active and involved,
despite being (usually) independent of
political parties and other traditional
means of shaping society. The other is
how motivation and peer-recognition
function in these communities; such
free-culture communities have much
in common with scientific communities, despite starting off with completely different motivations.
My generation (born in the 1970s),
including many people in the free software movement, has directly experienced the great shift computing and
networking have brought the world,
fully embracing the technologies. The
greatest difference between people
who are just users of computing and
those striving to make it better depends on who has the opportunity to
appropriate it beyond, say, the distraction level, the blind Google syndrome,
or the simple digestion of “piles of data
and information [that] do not equate to
greater knowledge and better decision
making.”
Thanks to Greengard for sparking
some useful thoughts.
Gunnar Wolf, mexico City
Communications welcomes your opinion. To submit a
Letter to the editor, please limit your comments to 500
words or less and send to letters@cacm.acm.org.
© 2009 ACM 0001-0782/09/1200 $10.00
Coming Next Month in
COMMUNICATIONS
Can Automated Agents
Negotiate with Humans?
MapReduce and Parallel
Database Management
Systems: Friend or Foe?
MapReduce: A Flexible Data
Processing Tool
What Should We Teach New
Software Developers?
Next-Generation Search
Cross-Language Translation
and the latest news on e-heritage,
computer museums, and technology
strategy and management.