evolves as they spread through a corporate network. Cataphora has developed
a fuzzy search algorithm to detect them,
but Schon admits the task is complex.
Creating an algorithm that organizes
sentences into text blocks, for example,
often forces researchers to make inflexible choices about boundaries, using
punctuation, length limits, paragraph
breaks, or some other scheme. That,
in turn, could cause a program to overlook a document whose author formats
things differently, such as not breaking
the text into paragraphs very frequently
or using unconventional punctuation.
Cataphora has also developed a
proprietary set of ontologies that cover
human resources-related topics, marketing issues, product development,
and more to examine various subject-specific communications. One way in
which they are useful, Schon explains,
is for studying the relationships between people and topics. If an executive
is central to communications about
product development, marketing, and
finance, but marginal to those about
sales, it’s likely that she or he is out of
the loop when it comes to the newest
sales tactics. Ontologies can also identify communications related to particular tasks, such as hiring and performance reviews. From there, engineers
can statistically determine what the
“normal” procedure is, and see when
it is and isn’t followed. Thanks to the
training corpus Cataphora has built
over time through its clients, these
ontologies perform quite well. Yet to
detect communication that is specific
to a particular industry, location, or
research group and whose names can
be idiosyncratic, “we may need to examine the workflow and develop more
specific ontologies,” says Schon.
Further analysis helps identify how
employees influence each other at
work. Aral, for example, correlates his
electronically derived network topolo-gies with traditional accounting and
project data, such as revenues and
completion rates, to try to understand
which factors enhance or diminish
certain outcomes. “The old paradigm
was that each employee had a set of
characteristics, like skills or education,
which he or she brought to a firm,”
Aral explains. “Our perspective is that
employees are all connected, and that
companies build a network of individ-
and share information
uals who help each other.” In a study of
five years of data from an executive recruiting firm, Aral found that employees who were more central to the firm’s
information flow—who communicated more frequently and with a broader
number of people—tended to be more
productive. It makes a certain amount
of sense. “They received more novel
information and could make matches
and placements more quickly,” Aral
notes. In fact, the value of novel information turned out to be quite high.
Workers who encountered just 10 novel words more than the average worker
were associated with an additional $70
in monthly revenue.
Yet Aral’s conclusions also point to
one of the more challenging aspects of
this type of research. If a position in the
corporate network is associated with increased productivity, is it because of the
nature of that position or because certain kinds of people naturally gravitate
toward it? “You always have to question your assumptions,” admits Aral.
New statistical techniques are needed,
he says, to more accurately distinguish
correlation from causation.
Large-scale data mining presents
another challenge. IBM’s SmallBlue,
which grew out of research at its Wat-
son Business Center, analyzes employ-
ees’ electronic data and creates a net-
worked map of who they’re connected
to and where their expertise lies. Em-
ployees can then search for people
with expertise on certain subjects and
find the shortest “social path” it would
take to connect them. SmallBlue is an
invaluable tool for large, international
firms, and IBM has used it to connect
its 410,000 employees since 2007. Yet
indexing the 20-plus million emails
and instant messages those employees
write is not a trivial task—not to men-
tion the 2 million blog and database
entries and 10 million pieces of data
that come from knowledge sharing
and learning activities. It is the largest
publicly known social network dataset
in existence, and the project’s founder,
Ching-Yung Lin, says IBM worked hard
to design a database that would hold
different types of data and dynamically
index the graphs that are generated.
Aral, S., Brynjolfsson, E., and van Alstyne, M.
Information, technology and information
worker productivity. International
Conference on Information Systems,
Milwaukee, WI, 2006.
Manning, C.D., raghavan, P., and Schütze, H.
Introduction to Information Retrieval.
Cambridge University Press, new York,
Mikawa, S., Cunnington, S., and Gaskis, S.
Removing barriers to trust in distributed
teams: understanding cultural differences
and strengthening social ties. International
Workshop on Intercultural Collaboration,
Palo Alto, CA, 2009.
Wasserman, S. and Faust, K.
Social Network Analysis: Methods and
Applications. Cambridge University Press,
Wu, L., Lin, C.-Y., Aral, S., and Brynjolfsson, E.
Value of social network: a large-scale
analysis on network structure impact to
financial revenue of information technology
consultants. Winter Information Systems
Conference, Salt Lake City, UT, 2009.
Leah hoffmann is a brooklyn, ny-based technology