repeated a fixed number of times or until the match graph
converges.
4. 3. Distributed implementation
We now consider a distributed implementation of the ideas
described above. Our matching system is divided into three
distinct phases: ( 1) pre-processing (Section 4. 3. 1), ( 2) verification (Section 4. 3. 2), and ( 3) track generation (Section
4. 3. 3). The system runs on a cluster of computers (nodes)
with one node designated as the master node, responsible
for job scheduling decisions.
4. 3. 1. Preprocessing and feature extraction
We assume that the images are available on a central store
from which they are distributed to the cluster nodes on
demand in chunks of fixed size. Each node down-samples
its images to a fixed size and extracts SIFT features. This
automatically performs load balancing, with more powerful nodes receiving more images to process. This is the only
stage requiring a central file server; the rest of the system
operates without using any shared storage.
At the end of this stage, the set of images (along with their
features) has been partitioned into disjoint sets, one for
each node.
4. 3. 2. Verification and detailed matching
The next step is to propose and verify (via feature matching)
candidate image pairs, as described in Section 3.
For the first two rounds of matching, we use the whole
image similarity (Section 4. 1), and for the next four rounds
we use query expansion (Section 4. 2).
If we consider the TFIDF vectors corresponding to the
images to be the rows of a huge matrix T, then the process
of evaluating the whole image similarity is equivalent to
evaluating the outer product S = TT ′. Each node in the cluster evaluates the block of rows corresponding to its images,
chooses the top k1 + k2 entries in each row and reports them
to the master node. Query expansion is a simple and cheap
enough operation that we let the master node generate
these proposals.
If the images were all located on a single machine, verifying each proposed pair would be a simple matter of running through the set of proposals and performing SIFT
matching, perhaps paying some attention to the order of
the verifications so as to minimize disk I/O. However, in
our case, the images and features are distributed across
the cluster. Asking a node to match the image pair (i, j)
may require it to fetch the image features from two other
nodes of the cluster. This is undesirable due to the large
difference between network transfer speeds and local disk
transfers, as well as creating work for three nodes. Thus,
the candidate edge verifications should be distributed
across the network in a manner that respects the locality
of the data.
We experimented with a number of approaches with surprising results. Initially, we tried to optimize network transfers before performing any verification. In this setup, once the
master node knows all the image pairs that need to be verified,
it builds another graph connecting image pairs which share
108 communications of the acm | oCTobEr2011 | voL. 54 | No. 10
an image. Using Me TiS,
12 this graph is partitioned into as
many pieces as there are compute nodes. Partitions are then
matched to the compute nodes by solving a linear assignment
problem that minimizes the number of network transfers
needed to send the required files to each node.
This algorithm worked well for small problems, but
not for large ones. Our assumption that verifying every
pair of images takes the same constant amount of time
was wrong; some nodes finished early and idled for up to
an hour.
Our second idea was to over-partition the graph into
small pieces, then parcel them out to nodes on demand.
When a node requests a chunk of work, it is assigned the
piece requiring the fewest network transfers. This strategy
achieved better load balancing, but as the problem sizes
grew, the graph we needed to partition became enormous
and partitioning itself became a bottleneck.
The approach that gave the best result was to use a
simple greedy bin-packing algorithm where each bin represents the set of jobs sent to a node. The master node
maintains a list of images on each node. When a node asks
for work, it runs through the list of available image pairs,
adding them to the bin if they do not require any network
transfers, until either the bin is full or there are no more
image pairs to add. It then chooses an image (list of feature vectors) to transfer to the node, selecting the image
that will allow it to add the maximum number of image
pairs to the bin. This process is repeated until the bin is
full. A drawback of this algorithm is that it can require multiple sweeps over all the remaining image pairs: for large
problems this can be a bottleneck. A simple solution is to
consider only a fixed sized subset of the image pairs for
scheduling. This windowed approach works very well in
practice and our experiments use this method.
4. 3. 3. Track generation
Until now, we have only compared two images at a time.
However, when a 3D point is visible in more than two
images and the features corresponding to this point have
been matched across these images, we need to group these
features together so that the geometry estimation algorithm
can estimate a single 3D point from all the features. We call
a group of features corresponding to a single 3D point a
feature track (Figure 2); the final step in the matching process
is to combine all the pairwise matching information to
figure 2: a track corresponding to a point on the face of the central
statue of oceanus (the embodiment of a river encircling the world in
Greek mythology).