Article development led by
Use the database built for your access model.
BY RICK RICHARDSON
THE TOPIC OF data storage is one that does not need to be
well understood until something goes wrong (data
disappears) or something goes really right (too many
customers). Because databases can be treated as black
boxes with an API, their inner workings are often
overlooked. They are often treated as magic things that
just take data when offered and supply it when asked.
Since these two operations are the only understood
activities of the technology, they are often the only features
presented when comparing different technologies.
Benchmarks are often provided in operations per
second, but what exactly is an operation? Within the
realm of databases, this could mean any number of
things. Is that operation a transaction? Is it an indexing
of data? A retrieval from an index? Does it store the
data to a durable medium such as a hard disk, or does
it beam it by laser toward Alpha Centauri?
It is this ambiguity that causes havoc in the
software industry. Misunderstanding the features and
guarantees of a database system can cause, at best,
user consternation due to slowness or unavailability.
At worst, it could result in fiscal damage—or even jail
time due to data loss.
The scope of the term database is
vast. Technically speaking, anything
that stores data for later retrieval is a
database. Even by that broad definition, there is functionality that is common to most databases. This article
enumerates those features at a high
level. The intent is to provide readers
with a toolset with which they might
evaluate databases on their relative
merits. Because the topics cannot be
covered here in the detail they deserve,
references to additional reading have
been included. These topics may be the
subjects for future articles.
This feature-driven approach should
allow readers to assess their own
needs and to compare technologies
by pairing up like features. When
viewed through this lens, comparative benchmarks are valid only on data
bases that are performing equal work
and providing the same guarantees.
Before digging into the features of
databases, let’s discuss why you would
not just take all of the features. The
short answer is that each feature typically comes with a performance cost, if
not a complexity cost.
Most of the functions performed by
a database, as well as the algorithms
that implement them, are built to work
around the performance bottleneck
that is the hard disk. If you have a requirement that your data (and meta
data) be durable, then you must pay
this penalty one way or another.
The Hard Disk
The serial ATA (SATA) bus of a typical
server (Ivy Bridge Architecture) has a theoretical maximum bandwidth of 750MB
per second. That seems high, but compare that with the PCI 3.0 bus, which has
a maximum of 40GB per second, or the
memory bus, which can do 14.9GB per
second per channel (with at least four
channels). The SATA bus has the lowest-bandwidth data path within a modern
server (excluding peripherals). 5
In addition to the bandwidth bottleneck, there is latency to consider. The
highest-latency operation encountered
within a data center is a seek to a ran-