size of the main visualization, to display as many items as possible,
and to minimize scrolling.
Missing Data. In the original GTD, all data for the year 1993 is missing due to an office move [ 6]. If the missing data was treated as zero,
there would be a sudden gap in 1993, possibly distracting users.
Another possibility was to interpolate numbers based on the previous and following year(s). When using such an approach, it was
best to draw the chart according to the interpolated value, but also
account for the fact that the data for this year is missing.
In order to let users know that 1993 does not have accurate data, an
asterisk was placed along the horizontal axis, where 1993 was supposed
to be. Hovering the mouse pointer over this asterisk reveals a text box
mentioning that data for 1993 is missing, and the current graph is
drawn based on interpolated values. This indicates that data in 1993 is
dealt with specially, without disturbing the overflow of the visualization.
There are also several incidents where a few fields have empty values. Such cases were just grouped as having the value “N/A.”
Coloring. Color plays an important role in the visualization. Not
only does it let users distinguish adjacent stripes, but can also convey additional information regarding the underlying data. Also, if
properly used it can make the visualization aesthetically appealing
and more enjoyable. Colors that would convey extra information,
but not overwhelm users, were chosen.
When the visualization is grouped by the country attacked, a similar color was used for countries in the same region. These colors are
based on the colors assigned to the stripes in the “Region” view. This
also makes the interaction between the “Region” and “Countries”
mentioned in chapter 3. 1. 1 smooth.
When meaningful extra information could not be found, a random RGB value was generated as the color of each stripe. The random value was biased to have stronger blue and green values and
relatively weaker red value, and softer hues.
When users hover the mouse pointer over a stripe, the targeted
stripe’s border is colored a bright yellow, and it’s width is thickened.
The random color generator is biased to avoid creating a strong
yellow. This keeps the highlighting effective. If adjacent stripes have
similar colors, it is difficult to tell them apart. By drawing a thin white
border around the stripe, this problem can be lessened. But when the
data set is large and several hundreds of stripes have to be displayed,
some parts of the chart become crammed, making individual stripes
appear indistinguishable. This causes an unpleasant white region to
be rendered, due to the dense collection of borderlines. By assigning
low alpha values to borderlines, both of the issues above are
addressed. There still is a separation between adjacent, similarly colored stripes, but crammed areas still display a hint of their original
color. It is also apparent that it is a heavily filled area.
For a comparison between several coloring schemes that were
tried, refer to http://www.cs.umd.edu/hcil/gtd/intro.html.
Performance. The original GTD raw data is over 60 megabytes in size
and comprises more than 75,000 incidents. The loading and processing
time for data of this size is significant on a Web-based program running
on a personal computer. Also, as the current working set of data grows,
many of the interactions become less and less responsive to user input.
Therefore the data was preprocessed to reduce the loading time. A
python program was written that took the original data and counted
the number of incidents per year grouped by certain criteria. A separate data file was created for each criteria, and is loaded each time
the user asks for the corresponding criteria. This simplifies the data,
trimming it down to a manageable size. This also makes it fairly easy
to extend the tool to support different types grouping criteria. But by
preprocessing data, much of the detailed information is lost. Runtime
responsiveness is also an issue when dealing with data sets with more
than 1,000 items. Updates to the visualization often take several seconds. Transition animations serve a twofold purpose: changes
become more perceivable, and redrawing delays can be masked.
Thin stripes are not displayed. These stripes do not affect the
appearance of the chart, yet can severely reduce the speed. The
threshold for determining such stripes is based on the relative thickness of the stripe in question with regard to the thickest stripe that is
currently displayed.
Results
Graduate students enrolled in the University of Maryland Information
Visualization Spring 2008 course provided feedback on how the tool
may be improved. Several graduate students in the Computer Science
department of the University of Maryland were asked to report interesting findings in the GTD Explorer, and were observed to see how
users use the tool. Suggestions from such users will be discussed in
the next chapter. Researchers from START have been using the GTD
Explorer for over six weeks in public presentations, and for their own
understanding. They have been relying on the tool to get a better
sense of major trends in the GTD data and to explicate to others.
The START group wishes to push this visualization tool out to the
intelligence and defense communities, as well as the general public,
to gain further feedback. The GTD Explorer is expected to be posted
on the START Web site. User comments to the page should provide
more valuable feedback.
Findings
The tool is especially useful for observing the major time-linked
structures buried in the data. For example, many “hotbeds” of terrorism from the 1970s and early 1980s—such as Italy—became far
less important after the 1990s, despite the overall increase in terrorism activity during this time.
Figure 11: Italy. There is a sharp decline in the number of incidents
in the 80s.