chitecture that will scale to Pbytes. All
vendors, with a very few exceptions, are
or will soon support MPP. Don’t bet on
anything that is not in the MPP camp.
5. “No knobs” is the only thing that
makes any sense.
It is pretty clear that human operational costs dominate the cost of running a data warehouse. This is mainly
the system administration and database administration that is involved in
keeping a MPP system up and in managing a Pbyte-sized warehouse. Database administrator (DBA) costs include
designing schemas, reprovisioning databases to add or drop resources, adding and dropping users, etc.
Almost all DBMSs have 100 or more
complicated tuning “knobs.” This requires DBAs to be “4-star wizards” and
drives up operating costs. The only thing
that makes sense is to have a program
that adjusts these knobs automatically.
In other words, look for “no knobs” as
the only way to cut down DBA costs.
6. Appliances should be “software only.”
In my 40 years of experience as a
computer science professional in the
DBMS field, I have yet to see a specialized hardware architecture—a so-called database machine—that wins.
In other words, one can buy general-purpose CPU cycles from the major
chip vendors or specialized CPU cycles
from a database machine vendor. Since
the volume of the general-purpose vendors are 10,000 or 100,000 times the
volume of the specialized vendors,
their prices are an order of magnitude
under those of the specialized vendor.
To be a price-performance winner, the
specialized vendor must be at least a
factor of 20− 30 faster.
I have never seen a specialized hardware architecture that is faster by this
Put differently, I think database ap-
pliances are a packaging exercise—i.e.,
preconfigure general-purpose hard-
ware and preload the DBMS on it. This
results in a software-only appliance.
a Diagram of a star schema.
Customer (c-key, c-attributes)
Time (t-key, t-attributes)
Fact (c-key, s-key, t-key, p-key, attributes)
task. Hence, HA is used for recovery,
not the DBMS log. Obviously, this requires the DBMS to support HA; otherwise, it is a manual DBA hassle to accomplish the same thing in user code.
9. DBMSs should support online reprovisioning.
Not always, but often, I hear a request for online reprovisioning. In
other words, one initially allocates 10
nodes to accomplish warehouse processing. The load later rises, and the
desire is to allocate 20 nodes to the task
originally done by 10. This requires the
database to be repartitioned over double the number of nodes.
Hardly anybody wants to take the required amount of downtime to dump
and reload the DBMS. A much better
solution is for the DBMS to support reprovisioning, without going offline.
10. Virtualization often has perfor-
mance problems in a DBMS world.
I hear many users say their long-term
goal is to move to the cloud, whether using the public cloud or inside the firewall on “an enterprise cloud.” Here, a
collection of servers is allotted to sev-eral-to-many DBMS applications inside
an enterprise firewall. Often, such systems are managed by using virtualization software to present virtual nodes
to the DBMS and its applications.
My experience is that CPU resources
can be virtualized with modest overhead (say, 10%). However, data warehouses entail disk-based data. In this
world, all MPP DBMSs “move the query
to the data.” Obviously, this requires
knowing the physical data distribution.
Virtualization will destroy this knowledge, and turn what were originally
reads to a local disk into reads to non-local disks. In other words, local I/O
gets replaced by remote I/O, with an obvious significant performance hit.
Until better and cheaper networking makes remote I/O as fast as local
I/O at a reasonable cost, one should be
very careful about virtualizing DBMS
Of course, the benefits of a virtualized environment are not insignificant,
and they may outweigh the performance hit. My only point is to note that
virtualizing I/O is not cheap.
Store (s-key, s-attributes)
Product (p-key, p-attributes)
Michael Stonebraker is an adjunct professor at the
Massachusetts Institute of Technology.