S→R and binary operator ⊕ Î R´R→R.
In fact, Richard Bird’s first homomorphism lemma3 says that any function
h Î M<S>→R is a homomorphism
with respect to ∪ if and only if it can
be factored into a map followed by a
reduce: h(xs) = xs.Select(f).Ag-gregate(⊕). Mathematics dictates
that coSQL queries are performed using MapReduce.
6
Practical implementations of MapReduce usually slightly generalize
Bird’s lemma to use SelectMany
instead of Select so that the map
phase can return a collection instead
of a single value, and insert an intermediate GroupBy as a way to “write”
equivalence classes of values from the
map phase into the key-value store for
subsequent processing in the reduce
phase, and then aggregate over each
subcollection:
xs.SelectMany(f).GroupBy(s).Select((k,g)
⇒
g.Aggregate(⊕k))
For example, DryadLINQ15 uses the
type PartitionedTable<S>:IQuer
yable<S> to represent the partitioned
input collection for LINQ queries and
then implements MapReduce over the
partitioned collection using the function illustrated in Figure 15.
In an open world where collections
are distributed across the network, it is
much harder for a query optimizer to
perform a global optimization taking
into account latency, errors, etc. Hence,
most coSQL databases rely on explicit
programmatic queries of a certain pattern such as MapReduce that can be
executed reliably on the target physical
machine configuration or cluster.
Conclusion
The nascent noSQL market is extremely fragmented, with many competing vendors and technologies. Programming, deploying, and managing
noSQL solutions requires specialized
and low-level knowledge that does not
easily carry over from one vendor’s
product to another.
A necessary condition for the net-
work effect to take off in the noSQL
database market is the availability of a
common abstract mathematical data
model and an associated query lan-
guage for noSQL that removes product
differentiation at the logical level and
instead shifts competition to the phys-
ical and operational level. The avail-
ability of such a common mathemati-
cal underpinning of all major noSQL
databases can provide enough critical
mass to convince businesses, develop-
ers, educational institutions, etc. to
invest in noSQL.
acknowledgments
Many thanks to Brian Beckman, Jimmy “the aggregator” Nilsson, Bedarra-Dave Thomas, Ralf Lämmel, Torsten
Grust, Maarten Fokkinga, Rene Bouw,
Alexander Stojanovic, and the anonymous referee for their comments that
drastically improved the presentation
of this paper, and of course to Dave
Campbell for supporting work on all
cool things LINQ.
Related articles
on queue.acm.org
A Conversation with
Erik Meijer and Jose Blakeley
http://queue.acm.org/detail.cfm?id=1394137
BASE: An Acid Alternative
Dan Pritchett
http://queue.acm.org/detail.cfm?id=1394128
Bridging the Object-Relational Divide
Craig Russell
http://queue.acm.org/detail.cfm?id=1394139
References
1. awodey, s. Category Theory (2nd edition). oxford
university Press, 2010.
2. baker, J., bond, c. et al. Megastore: providing scalable,
highly available storage for interactive services.
Conference on Innovative Data Systems Research.
(2011).
3. bird, r. an introduction to the theory of lists. In Logic
Programming and Calculi of Discrete Design. M. broy,
ed. springer-Verlag (1987), 3–42.
4. codd, t. a relational model of data for large shared
data banks. Commun. ACM 13 (June 1970).
5. copeland, G.and Maier, d. Making smalltalk a
database system. In Proceedings of the ACM SIGMOD
International Conference on Management of Data.
1984.
6. Fokkinga, M. Mapreduce—a two-page explanation
for laymen; http://www.cs.utwente.nl/~fokkinga/
mmf2008j.pdf.
7. Ghosh, r.a. an economic basis for open standards
(2005); flosspols.org.
8. Grust, t. 2003. Monad comprehensions: a versatile
representation for queries. In The Functional
Approach to Data Management. P. Gray, l. kerschberg,
P. king, and a. Poulovassilis, eds. springer Verlag,
2003, 288–311.
9. Grust, t., rittinger, J., schreiber, t. avalanche-safe lInQ compilation. Proceedings of the VLDB
Endowment 3 ( 1–2), 2010.
10. Meijer, e. subject/observer is dual to iterator.
Presented at FIt: Fun Ideas and thoughts at the
conference on Programming language design and
Implementation (2010); http://www.cs.stanford.edu/
pldi10/ fit.html.
11. Meijer, e., beckman, b., bierman, G. lInQ: reconciling
objects, relations, and XMl in the .net framework.
Proceedings of the ACM SIGMOD International
Conference on Management of Data. acM, new york,
2006.
12. Pirayoff, r. Economics Micro & Macro. cliffs notes,
2004.
13. Pritchett, d. base: an acid alternative. ACM Queue
(July 2008).
14. stonebraker, M., hellerstein, J.M. what goes around
comes around. In Readings in Database Systems
(Fourth edition). M. stonebraker, and J.M. hellerstein,
eds. MIt Press, cambridge, Ma, 2005, 2–41.
15. yuan yu, M. I. dryadlInQ: a system for general-purpose distributed data-parallel computing using a
high-level language. Operating Systems Design and
Implementation. 2008.
Erik Meijer ( emeijer@microsoft.com) has been working
on “democratizing the cloud” for the past 15 years.
he is perhaps best known for his work on the haskell
language and his contributions to lInQ and the reactive
Framework (rx).
Gavin Bierman ( gmb@microsoft.com) is a senior
researcher at Microsoft research cambridge focusing
on database query languages, type systems, semantics,
programming language design and implementation, data
model integration, separation logic, and dynamic software
updating.
© 2011 acM 0001-0782/11/04 $10.00