INTERACTIONS.ACM.ORG 72 INTERACTIONS JULY–AUGUST2014
FORUM EVALUATION AND USABILITY
• What is the output of your work?
Then we asked them to tell us in
more depth about tasks that involve
identifying someone. Things we probed
for included:
• What types of data are available
for use?
• What is the certainty needed in
their identification?
• What features of an individual do
you usually have to start with?
• What features are the most helpful
in making identifications?
• What features do you always search
for?
• What features are the easiest to
obtain?
• What features are the most difficult
to obtain?
• What is the timeframe in which you
must identify someone?
• Have you ever been unsuccessful
identifying an individual? If so, why?
Next we gave participants a
questionnaire that listed a number of
attributes and asked them to rank the
categories that are most important to
them for their identification needs. Here
is a list of categories and some examples
in each:
• demographics (age, gender,
addresses, work history, family and
friends’ names)
• financial information (salary, credit
reports)
• court records (arrests, gun
registrations, outstanding warrants)
• physical attributes (tattoos, height,
weight, fingerprints, birthmarks)
• cyber attributes (IP address, email
addresses, social network usernames)
• official documents (driver’s license,
passport, official ID)
• other (publications, toll-road
records, surveillance videos)
We also asked them to rank on a
seven-point Likert scale to what degree
the following were important to them in
identification of an individual:
• confidence/provenance of the
accuracy of the intelligence gathered
• speed of access to datasets and
analysis
• robustness of evidence over time
• completeness/richness of identity
We then created a description of each
individual’s identification task, using
known attributes and unknown but
desired attributes along with a certainty
that needed to be achieved. In total,
Figure 2. A use case created from one of the interviews.
From Username to the Person
Given an individual’s username, determine who that person (real name, skills, age, beliefs,
etc.) may be in the physical world
Target audience
Intelligence, law enforcement may use similar techniques
Description
A suspicious article was posted online that gets attention of the intelligence community. The
IP address was tracked to an internet cafe in a large city. At this cafe, several incomplete data
points were collected: low-quality surveillance video from the past two weeks, hundreds of
fingerprints, and some credit card information. In addition, the username of this individual,
the text written, the blogging site where this information was posted, and several user
comments on the post. The host of the blogging site is not friendly, so cannot be leveraged.
The investigator wishes to understand who this person is (and quickly). In particular, they
would like to know the real name of the user, whether the account is shared or individually
owned, the associates of this person, their skill level, age, gender, and ideology.
* NOTE: A SLIGH T VARIATION OF THIS SCENARIO THAT HAS OCCURRED WITH LAW ENFORCEMEN T
IS A HAND WRIT TEN NO TE TO NE WSPAPERS/ GOVERNMEN T EMPLO YEES THAT IS USED AS A S TARTING POIN T.
Domains
• Cyber (username, writing samples, account sharing)
• Biographical (associates, real name, skill level, age, gender, credit card info)
• Psychological (ideology)
• Biometric (fingerprint, gait, face)
* NOTE: BIOME TRIC DATA IS NO T EXPLICI TLY LIS TED AS A S TAR TING OR ENDING POIN T HERE,
BU T IT MAY VERY WELL BE A KE Y TRANSITIONAL POIN T NEEDED FOR THIS SCENARIO.
Implications for research
• Can potentially link from one writing sample to another writing sample
• Need a way to differentiate between more than one user using the same account—perhaps
by online behaviors
• Deception is particularly an issue with data online, but confidence may be increased as
more sources point to the same conclusion
Implications for the SID tool
• Analyze writing samples for comparison.
• Multiple identity attribution activities will be occurring at once when one traces through the
social network.
• Ideology (and other subjective attributes) will need user intervention to be determined.
• Attributes and connections that do not currently exist in the model will need to be created
on the fly. This will augment the SID model so that these connection and attributes can
persistent and be used again.
• Consider augmenting the tool with capabilities to query social media to gather as much
relevant information as possible.
• More information comes in as the case develops… the given and the end attributes will
likely be a bit more dynamic.