// value: document contents
for each word w in value:
EmitIntermediate(w, “ 1”);
reduce(String key, Iterator values):
// key: a word
// values: a list of counts
int result = 0;
for each v in values:
result += ParseInt(v);
Emit(AsString(result));
The map function emits each word plus an associated count of
occurrences (just 1 in this simple example). The reduce function
sums together all counts emitted for a particular word.
In addition, the user writes code to fill in a mapreduce specification
object with the names of the input and output files and optional tuning parameters. The user then invokes the MapReduce function, passing it to the specification object. The user’s code is linked together
with the MapReduce library (implemented in C++). Our original
MapReduce paper contains the full program text for this example [ 8].
More than ten thousand distinct programs have been implemented
using MapReduce at Google, including algorithms for large-scale
graph processing, text processing, data mining, machine learning, statistical machine translation, and many other areas. More discussion of
specific applications of MapReduce can be found elsewhere [ 8, 16, 7].
2. 2 Types
Even though the previous pseudocode is written in terms of string
inputs and outputs, conceptually the map and reduce functions supplied by the user have associated types.
→ list(k2,v2)
→ list(v2)
map
reduce
(k1,v1)
(k2,list(v2))
That is, the input keys and values are drawn from a different domain
than the output keys and values. Furthermore, the intermediate keys
and values are from the same domain as the output keys and values.
3. Implementation
Many different implementations of the MapReduce interface are possible. The right choice depends on the environment. For example, one
implementation may be suitable for a small shared-memory machine,
another for a large NUMA multiprocessor, and yet another for an even
larger collection of networked machines. Since our original article, several open source implementations of MapReduce have been developed
[ 1, 2], and the applicability of MapReduce to a variety of problem
domains has been studied [ 7, 16].
This section describes our implementation of MapReduce that is targeted to the computing environment in wide use at Google: large clusters
of commodity PCs connected together with switched Gigabit Ethernet
[ 4]. In our environment, machines are typically dual-processor x86
processors running Linux, with 4-8GB of memory per machine.
Individual machines typically have 1 gigabit/second of network bandwidth, but the overall bisection bandwidth available per machine is con-
siderably less than 1 gigabit/second. A computing cluster contains many
thousands of machines, and therefore machine failures are common.
Storage is provided by inexpensive IDE disks attached directly to individual machines. GFS, a distributed file system developed in-house [ 10], is
used to manage the data stored on these disks. The file system uses replication to provide availability and reliability on top of unreliable hardware.
Users submit jobs to a scheduling system. Each job consists of a
set of tasks, and is mapped by the scheduler to a set of available
machines within a cluster.
3. 1 Execution Overview
The map invocations are distributed across multiple machines by automatically partitioning the input data into a set of M splits. The input
splits can be processed in parallel by different machines. Reduce invocations are distributed by partitioning the intermediate key space into R
pieces using a partitioning function (e.g., hash(key) mod R). The number
of partitions (R) and the partitioning function are specified by the user.
Figure 1 shows the overall flow of a MapReduce operation in our
implementation. When the user program calls the MapReduce function, the following sequence of actions occurs (the numbered labels in
Figure 1 correspond to the numbers in the following list).
1. The MapReduce library in the user program first splits the input files
into M pieces of typically 16-64MB per piece (controllable by the
user via an optional parameter). It then starts up many copies of the
program on a cluster of machines.
2. One of the copies of the program—the master— is special. The rest
are workers that are assigned work by the master. There are M map
tasks and R reduce tasks to assign. The master picks idle workers and
assigns each one a map task or a reduce task.
3. A worker who is assigned a map task reads the contents of the corresponding input split. It parses key/value pairs out of the input data and
passes each pair to the user-defined map function. The intermediate
key/value pairs produced by the map function are buffered in memory.
4. Periodically, the buffered pairs are written to local disk, partitioned
into R regions by the partitioning function. The locations of these
buffered pairs on the local disk are passed back to the master who
is responsible for forwarding these locations to the reduce workers.
5. When a reduce worker is notified by the master about these locations, it uses remote procedure calls to read the buffered data from
the local disks of the map workers. When a reduce worker has read
all intermediate data for its partition, it sorts it by the intermediate
keys so that all occurrences of the same key are grouped together.
The sorting is needed because typically many different keys map to
the same reduce task. If the amount of intermediate data is too large
to fit in memory, an external sort is used.
6. The reduce worker iterates over the sorted intermediate data and for
each unique intermediate key encountered, it passes the key and the
corresponding set of intermediate values to the user’s reduce function. The output of the reduce function is appended to a final output file for this reduce partition.