usage monitoring, group membership, data storage (such as distributed file systems and key-value lookup services), distributed agreement (consensus), and locking.
The application resource management layer manages the allocation of physical resources to the actual applications and platforms
including higher-level service abstractions (virtual machines) offered
to end-users. The management layer deals with problems related to
the application placement, load balancing, task scheduling, service-level agreements, and others.
Finally, we enumerate some cross-cutting concerns that dissect the
entire cloud infrastructure. We will focus on these issues: energy, privacy and consistency, the lack of standards, benchmarks, and test beds
for conducting cloud related research.
Energy
Large cloud providers are natural power hogs. To reduce the carbon
footprint, data centers are frequently deployed in proximity to hydroelectric plants and other clean energy sources. Microsoft, Sun, and
Dell have advocated putting data centers in shipping containers consisting of several thousand nodes at a time, thus making deployment
easier. Although multi-tenancy and the use of virtualization improves
resource utilization over traditional data centers, the growth of cloud
provider services has been rapid, and power consumption is a major
operating expense for the large industry leaders.
Fundamental questions exist of how, where, and at what cost can
we reduce power consumption in the cloud. Here we examine three
examples to illustrate potential directions.
Solid-state disks (SSDs) have substantially faster access times and
draw less power than regular mechanical disks. The downside is that
SSDs are more expensive and lack durability because blocks can
become corrupted after 100,000 to 1,000,000 write-erase cycles. SSDs
have made their way into the laptop market—the next question is
whether cloud data centers will follow [ 14]. Can we engineer mechanisms to store read-intensive data on SSDs instead of disks?
Google has taken steps to revamp energy use in hardware by producing custom power supplies for computers which have more than
double the efficiency of regular ones [ 12]. They even patented a
“water-based” data center on a boat that harnesses energy from ocean
tides to power the nodes and also uses the sea for cooling. How can
we better design future hardware and infrastructure for improved
energy efficiency? How can we minimize energy loss in the commodity machines currently deployed in data centers?
In the same fashion that laptop processors adapt the CPU frequency
to the workload being performed, data center nodes can be powered
up or down to adapt to variable access patterns, for example, due to
diurnal cycles or flash crowds. Some CPUs and disk arrays have more
flexible power management controls than simple on/off switches, thus
permitting intermediate levels of power consumption [ 13]. File systems
spanning multiple disks could, for instance, bundle infrequently
accessed objects together on “sleeper” disks [ 9]. More generally, how
should data and computation be organized on nodes to permit software to decrease energy use without reducing performance?
Privacy Concerns
Storing personal information in the cloud clearly raises privacy and security concerns. Sensitive data are no longer barred by physical obscurity or obstructions. Instead, exact copies can be made in an instant.
Technological advances have reduced the ability of an individual to
exercise personal control over his or her personal information, making
it elusive to define privacy within clouds [ 5]. The companies that
gather information to deliver targeted advertisements are working
toward their ultimate product: you. The amount of information
known by large cloud providers about individuals is staggering, and
the lack of transparent knowledge about how this information is used
has provoked concerns.
Are there reasonable notions of privacy that would still allow businesses to collect and store personal information about their customers
in a trustworthy fashion? How much are users willing to pay for additional privacy?
We could trust the cloud partially, while implementing mechanisms for auditing and accountability. If privacy leaks have serious
legal repercussions, then cloud providers would have incentives to
deploy secure information flow techniques (even if they are heavy-handed) to limit access to sensitive data and to devise tools to locate
the responsible culprits if a breach is detected [ 17]. How can such
mechanisms be made practical? Is the threat of penalty to those individuals who are caught compromising privacy satisfactory, or should
the cloud be considered an untrusted entity altogether?
If we choose not to trust the cloud, then one avenue of research is
to abstract it as a storage and computing device for encrypted information. We could use a recent invention in cryptography called
fully homomorphic encryption [ 10]; a scheme allowing the sum and
multiplication (and hence arbitrary Boolean circuits) to be performed
on encrypted data without needing to decrypt it first. Unfortunately,
the first implementations are entirely impractical, but beg the question
whether homomorphic encryption can be made practical.
Another approach is to sacrifice the generality of homomorphic
encryption. We can identify the most important functions that need to
be computed on the private data and devise a practical encryption
scheme to support these functions—think MapReduce [ 7] on encrypted
data. As a high-level example, if all emails in Gmail were encrypted by
the user’s public key and decrypted by the user’s web browser, then
Gmail could not produce a search index for the mailbox. However, if
each individual word in the email were encrypted, Gmail could produce
an index (the encrypted words would just look like a foreign language)
but would not understand the message contents.
The latter case implies that Gmail could not serve targeted ads to
the user. What are the practical points on the privacy versus function-ality spectrum with respect to computational complexity and a feasible
cloud business model? Secure multiparty computation (SMC) allows
mutually distrusting agents to compute a function on their collective
inputs without revealing their inputs to other agents [ 19]. Could we
partition sensitive information across clouds, perhaps including a
trusted third-party service, and perform SMC on the sensitive data?
Is SMC the right model?
Consistency
In a broad sense, consistency governs the semantics of accessing the
cloud-based services as perceived by both the developers and end
users. The consistency issues are particularly relevant to the distributed
computing infrastructure services (see Figure 1), such as data storage.
The most stringent consistency semantics, known as serializability
or strong consistency [11], globally orders the service requests and
Crossroads
www.acm.org/crossroads
Spring 2010/ Vol. 16, No. 3