own experiments to determine how
motion was reflected in CDRs. We ignored text CDRs because text messages
involve only a single location. Second,
since we were interested in routes to
and from the center of town, we used
only CDRs with antenna sequences
that began or ended at the tower handling calls for the core downtown area.
After filtering, we were still left with
tens of thousands of CDRs.
We began by identifying 15 common commuting routes ( 13 driving
routes and two train routes) radiating
from the town center. We obtained
ground-truth data for them by driving/
riding each one four times (two in each
direction), using at least two phones
calling each other on each drive/ride.
We obtained the CDRs for these calls
to both train and test our algorithms.
From our training data, we determined
a reference pattern of cellular sectors
used by calls on each of the routes. We
intentionally included some routes
very close to one another and others
that partially overlap, as routes do in
real life. Some of our reference patterns were thus quite similar, making
disambiguation a challenge.
We then developed two methods
for assigning CDRs to routes: One
uses a distance metric to assign a test
CDR to the route with the closest reference pattern. We used a variant of
Earth Mover’s Distance (EMD), a measure of the difference between two
arbitrary probability distributions, as
a metric that takes into account common subsets of sectors, the particular
sequence of sectors, how long the call
is associated with each sector, and
tower locations. The other method
uses as reference data the radio-fre-quency scans routinely performed by
cellular network operators to measure
network coverage. The scanner data
contains signal-strength measurements stamped with global-position-ing system (GPS) locations from all observable antennas along major driving
routes. Our classification algorithm
estimates the likelihood of a given sequence of antennas being seen on a
particular route and selects the most
likely route. This approach has the
advantage of being able to reuse data
that is already available, without requiring additional data collection on
every target route. It could easily be ex-
figure 4. Paradeshed of Morristown, NJ; the red dot denotes the city center.
Five times as many people were in Morristown for the st. Patrick’s day Parade as on a normal saturday.
to show the geographical distribution of parade attendees’ homes, we mapped the number of people
coming from each surrounding ZIP code. Green-yellow areas contributed more than the parade-day
average and purple-red areas less than that average. Communities contributing near the average
are not colored to highlight the outliers.
of a geographic area during arbitrary
time periods. Of particular interest to
city officials is how the mix of inhabitants changes during special occasions
(such as extreme weather, construction projects, and regional events).
Knowing where people come from can
help them in advertising for the event
and easing traffic congestion.
One such occasion in Morristown
was the St. Patrick’s Day Parade on
Saturday March 12, 2011, from 11 a.m.
to 3 p.m. We repeated our analysis for
obtaining the laborshed but on cellphone transactions handled during
the time of the parade by the antennas pointing along the parade route.
Figure 4 is the resulting paradeshed,
with people coming for the parade,
compared with data for the same antennas and time interval on a typical
Saturday without special events. The
parade is a county affair, so we would
have expected the event to draw
widely from other parts of the county
(north and west of Morristown). Indeed, we see the areas north and west
of Morristown showing large increases, while other areas south and east
show smaller increases. Prior to the
advent of cellular networks, it was notably difficult for local officials to obtain this information except through
The quality of life in any urban area is
directly influenced by the frustration,
pollution, time lost, and noise of traffic congestion. Efforts by planners to
improve traffic flow while not sacrificing street life need a thorough understanding of existing traffic conditions.
Since traditional methods of obtaining
traffic data are expensive, we set out to
determine whether we could estimate
traffic volumes from CDRs.
To explore traffic volume on major
commuting routes into Morristown,
we used the same data-collection procedure we used to calculate the laborshed, as described earlier. However, in
this case we recorded activity in and
around Morristown from December
2009 to January 2010. We used two filters to obtain an appropriate subset of
CDRs for the study: First, to retain data
about moving vehicles, we used only
voice CDRs including antennas on at
least five towers, as indicated by our