RDBMSs allow both a functional interface, if data access is provided via
stored procedures only, and a query-based interface, if the database may be
queried directly. A recent and detailed
survey on cloud storage technologies16
proposes a similar classification of
cloud data services, but further differentiates sparse tables into document
stores and extensible record stores. With
respect to the architecture in Figure 1,
some of these services forgo the model
mapping layer, choosing instead to directly expose their underlying model
to consuming applications.
To illustrate the various types of
cloud data services, we will briefly
examine the Amazon Web Services
(AWS)
1 platform, as it has arguably pioneered the cloud data management
effort. Other IT companies are also
switching to building cloud data management frameworks, either for internal applications (for example, Yahoo!’s
PNUTS20) or to offer as publicly available data services (for example, Microsoft’s WCF data services, as made available in Windows Azure34).
Key-value stores: The simplest kind
of data storage services is key-value
stores that offer atomic CRUD operations for manipulating serialized data
structures (objects, files, among others) that are identifiable by a key.
An example of a key-value store
is Amazon S31 that provides storage
support for variable size data blocks,
called objects, uniquely identified by
(developer assigned) keys. Data blocks
reside in buckets, which can list their
content and are also the unit of access control. Buckets are treated as
subdomains of s3.amazonaws.com.
(For instance, the object custom-
er01.dat in the bucket custorder
can be accessed as http://custorder.
s3.amazonaws.com/customer01.dat.)
The most common operations in
S3 are:
˲ create (and name) a bucket,
˲ write an object, by specifying its
key, and optionally an access control
list for that object,
˲ read an object,
˲ delete an object, and,
˲ list the keys contained in one of the
buckets.
Data blocks in S3 were designed to
hold large objects (multimedia files),
but they can potentially be used as da-
Integrated Data Services
here, we dive a bit deeper into the odsI example to more fully convey odsI’s
approach to data service integration. In particular, the accompanying figure
presents a screen shot that shows how the odsI graphical query editor would
be used in the example scenario when the customer data source is a relational
customer table and the orders and line items are coming from an order
management Web service that takes a customer id as input and returns the orders
for that customer as output.
let us take a closer look at the figure. the upper left box in the query editor
screen shot represents the Customer physical data service, which returns elements
containing the CId, naMe, and state information for customers. the middle box
represents the Web service call GetOrdersByCid( ), which takes an XMl input
containing the cid of a customer and returns an XMl result containing the orders
(including the nested line items) for this customer. the artifact being constructed
is itself an XQuery query that will be the body of a callable data service function—a
function called getAllCustomers()—and the box on the far right of the screen
shot represents the XMl result structure for the function. the rest of the screen
shot is a graphical specification of how the data from the two physical data services
is to be mapped and combined. the resulting getAllCustomers( ) data service
function can then be made available as a callable Web service by simply instructing
odsI to do so through a few additional mouse clicks.
More information about odsI’s approach to data integration and its data service
authoring tools may be found in Carey et al.
13 and oracle.
38
integrating RDBMs and Web service data sources.
tabase pages storing (several) records.
Brantner et al.
11 started from this observation and analyzed the trade-offs
in building a database system over S3.
However, recent industrial trends are favoring the incorporation of more DBMS
functionality directly into the cloud infrastructure, thus offering a higher-level
interface to the application.
Dynamo21 is another well-known
Amazon example of a key-value store.
It differs in granularity from S3 since it
stores only objects of a relatively small
size (< 1MB), whereas data blocks in S3
may go up to 5GB.
Sparse tables are a new paradigm of
storage management for structured
and semi-structured data that has
emerged in recent years, especially after the interest generated by Google’s
Bigtable.
18 (Bigtable is the storage
system behind many of Google’s applications and is exposed, via APIs,
to Google App Engine25 developers.)
A sparse table is a collection of data
records, each one having a row and a
set of column identifiers, so that at the
logical level records behave like the
rows of a table. There may be little or
no overlap between the columns used