a mixed-methods analysis approach combining qualitative
and quantitative observations.
5. 1. lab study
We first ran a pilot study with 6 subjects ( 2 females, 4 males),
all of whom were members of our immediate research team.
Comments from the pilot were visible in a subsequent 12
subject ( 3 females, 9 males) study, with subjects drawn from
our greater research lab. Subjects were at least peripherally
familiar with each other and many were coworkers. Ages
ranged from the early-twenties to mid-fifties and education varied from the undergraduate to the doctoral level,
spanning backgrounds in computer science, design, social
science, and psychology. Concerned that our lab’s focus in
collaborative software might bias results, we replicated the
lab study in a university environment with additional 12
subjects ( 5 females, 7 males). Subject variation in age, education, and social familiarity remained similar.
Subjects conducted a 25min usage session of the
sense.us system. A single visualization was available in the
study: a stacked time series of the U.S. labor force over time,
divided by gender (Figure 1). Users could navigate the visualization by typing in text queries (matched to job title pre-fixes), filtering by gender, and setting the axis scale, either to
total people count or percentage values.
This data set was chosen for several reasons. First, job
choice is a topic that most of our users should have no difficulty relating to. Second, like many other real-world data
sets, there are data collection issues, including missing data
and unclear or antiquated labels. Third, we suspected the
data would be an interesting boundary case for annotations,
as for many visualization views, text seemed sufficient when
referencing spikes or valleys in the data.
After a brief tutorial of system features, participants were
instructed to use the system however they liked—no specific
tasks were given. However, users were told that if they felt at
a loss for action, they could browse the data for trends they
found interesting and share their findings. An observer was
present taking notes and a think-aloud protocol was used.
User actions were also logged by the software. Subjects were
run in sequential order, such that later participants could
view the contributions of previous subjects but not vice
versa. The system was seeded with five comments, each with
an observation of a particular data trend.
After the study, subjects completed a short exit questionnaire about their experiences. Participants were asked to
rate on a 5-point Likert scale to what degree ( 1) they enjoyed
using the system, ( 2) they learned something interesting, ( 3)
others’ comments were helpful in exploring the data, and if
they found annotations useful for ( 4) making their own comments, or ( 5) understanding others’ comments. Subjects
were also asked free response questions about what they
liked, disliked, and would change about the system.
5. 2. live deployment
We also conducted a live deployment of the system on the
IBM corporate intranet for 3 weeks. Any employee could log
in to the system using their existing intranet account. Eight
visualizations were available in the system, among them
were the visualizations of Figures 1 and 2 and a scatterplot of
demographic metrics (see Figure 4). We also introduced two
visualizations specific to the company: stacked time series
of keyword tagging activity and individual user activity on
dogear, an internal social bookmarking service. The site was
publicized through an e-mail newsletter, an intranet article,
and individual e-mails.
5. 3. findings
In the rest of this section, we report observations from these
studies, organized by commentary, graphical annotations,
navigation patterns, and use of doubly linked discussion. As
variation in content and tone differed little across studies,
the discussion incorporates data aggregated from each. The
data analyzed were drawn from 12. 5 h of qualitative observation and from usage logs including 258 comments: 41 from
the pilot, 85 from the first study, 60 from the second, and 72
from the deployment.
5. 4. comments
We first wanted to learn how comments were being used
to conduct social data analysis—was there a recognizable
structure to the discussions? To find out, we performed a
formal content analysis on the collected comments. Each
paper author independently devised a coding rubric based
upon a reading of the comments. We then compared our
separate rubrics to synthesize a final rubric that each author
used to independently code the comments. The final coding
rubric categorized comments as including zero or more of
the following: observations, questions, hypotheses, links or
references to other views, usage tips, socializing or joking,
affirmations of other comments, to-dos for future actions,
and tests of system functionality. We also coded whether or
not comments made reference to data naming or collection
issues, or to concerns about the Web site or visualization
design. The coded results were compared using Cohen’s
figure 4: scatterplot of u.s. states showing median household
income (x-axis) vs. retail sales per capita (y-axis). new hampshire
and Delaware have the highest retail sales.