is a widespread practice of writing
Indian languages using Roman script
as well as mixing it with English during writing/speaking,g a phenomenon
referred to as linguistic code-mixing
or code-switching. For any analysis
of social media content from India,
correct processing of code-mixed text
is an absolute necessity; however,
traditional natural language processing (NLP) modules such as language
identifiers, POS taggers, translators,
and word aligners treat linguistic code-switching data either as noise or as a
new language (for example, Hinglish
for Hindi-English code mixing). Both
views are limited because the former
does not recognize the complexity and
socio-pragmatics of the phenomenon,
whereas the latter does not utilize
the fact that code mixing is a grammatically informed combination of two
languages. Further, bilingual speakers
show different language references
depending on the topic of discussion
and sentiment expressed. This implies
that ignoring code-mixed patterns or
conducting content-analysis only for
the predominant language over social
media (usually English) can lead to misleading conclusions, and are bound to
miss out on social and discourse-level
nuances in the data. Several researchers from India have worked to address
different aspects of code-switching;
Microsoft Research India, under
project Melange,h has largely led the
initiative. Several semi-supervised10
techniques to automatically produce
a large, annotated code-mixed dataset
are being developed to help the community efficiently perform downstream
supervised NLP tasks.
Killfies for social media. In recent
years, the posting of selfies (or digital
self-portraits) on social media websites
such as Facebook, Instagram, and
Snapchat has become a part of main-
stream culture. Often people portray
their adventurousness by posting
dangerous selfies (aka killfies). Since
March 2014, 238 people are reported
g for example, a bilingual Hindi/English speak-
er posts on Twitter: “aj patakhe to india me hi
phutenge, sure it would be,” where the itali-
cized segment (“today fireworks will occur in In-
dia only”) is in Hindi written in Roman script.
to have been killed while taking selfies,i
with India dominating these statistics
with 141 deaths. Given the increasing
penetration of mobile technology,
high usage statistics, and the distur-
bances caused by such behavior, India
is one of the prime regions where
this problem is particularly relevant.
Research conducted by Precog@IIIT
Delhij identifies dangerous selfies.
The researchers have created datasets,
classifiers, apps, and location-marker
tools in this context. A convolutional
neural network-based classifier to
identify dangerous selfies posted on
social media using only the image (no
metadata) gives an accuracy of 98%.
The Saftie Camerak app based on the
developed classifier works in real world
settings and detects and warns a user if
the location is potentially dangerous.
Important funding initiatives.
There has been a lot of funding ini-
tiatives both from government and
non-government agencies to popu-
larize social media research. Among
those initiatives is the Indo-German
Max Planck Center for Computer
Science—a five-year project on Un-
derstanding, leveraging and deploying
online social networks, jointly funded
by the Indian Department of Science
and Technology and Max Planck Soci-
ety. Another initiative is the Media
Lab Asia and Information Technol-
ogy Research Academy (ITRA)-funded
five-year project on Post disaster situ-
ation analysis and resource manage-
ment, which patronized the research
on investigating the role of social
media for disaster management.
Challenges. Presently, the world is
witnessing several negative impacts of
OSMs. Hence, it is important for the
computing world, with intense research
input from scientists all over the world,
to mitigate these impacts. The specific
problems are many—fake news, hate
speech, the shaming of individuals or
groups. It is now clear that in the garb
of spontaneity, companies, political
parties, and individuals are constantly
manipulating the systems to produce
trending topics and thus control discus-
sions on social media. The problems
are compounded in India with the
i http://bit.ly/saftie- bot
unprecedented rise in use of local or
code-mix languages; hence the need for
special attention from Indian research-
ers. Another diagonally opposite area
of research would be to leverage social
media for social good; work on post-
disaster management as reported here;
and future scopes including utilizing
social media content to devise better
governance mechanisms, supporting
individuals/groups with health-related
issues, and making quality education
accessible to the huge population by
connecting teachers with students
located in different places.
Acknowledgments. The authors
thank Sunita Sarawagi, Abir De, and
the anonymous reviewers for provid-
ing constructive feedback.
1. Bhattacharya, P. et al. Deep Twitter diving: Exploring
topical groups in microblogs at scale. In Proceedings
of the 17th ACM Conf. Computer Supported Cooperative
Work and Social Computing, 2014, 197–210.
2. Chakraborty, A., Messias, J., Benevenuto, F., Ghosh,
S., Ganguly, N.and Gummadi, K.P. Who makes trends?
Understanding demographic biases in crowdsourced
recommendations. In Proceedings of the 11th Intern.
AAAI Conf. Web and Social Media, 2017.
3. De, A., Bhattacharya, S.and Ganguly, N. Demarcating
Endogenous and Exogenous Opinion Diffusion Process
on Social Networks. In Proceedings of the 2018 World
Wide Web Conf., 2018, 549–558.
4. De, A., Valera, I., Ganguly, N., Bhattacharya, S. and
Gomez-Rodriguez, M. Learning and forecasting opinion
dynamics in social networks. In Proceedings of the
30th Inter. Conf. Neural Information Processing
Systems, 2016, 397–405.
5. Maity, S.K., Chakraborty, A., Goyal, P. and Mukherjee,
A. Opinion conflicts: An effective route to detect
incivility in Twitter. In Proc. ACM Hum.-Comput.
Interact. Article 117 (2018), 117:1–117: 27.
6. Pratapa, A., Bhat, G., Choudhury, M., Sitaram, S.,
Dandapat, S. and Bali, K. Language modeling for code-mixing: The role of linguistic theory based synthetic
data. In Proceedings of the 56th Annual Meeting of the
Assoc. Computational Linguistics, Vol. 1. (Melbourne,
Australia, 2018), 1543–1553; https://www.aclweb.org/
7. Rudra, K., Ganguly, N., Goyal, P. and Ghosh, S.
Extracting and summarizing situational information
from Twitter social media during disasters. ACM
Trans. Web 12, 3 (July 2018), 17:1–17: 35.
8. Rudra, K., Goyal, P., Ganguly, N., Mitra, P. and Imran, M.
Identifying sub-events and summarizing disaster-related information from microblogs. In Proceedings
of the 41st Intern. ACM SIGIR Conf. Research and
Development in Info. Retrieval, 2018, 265–274.
9. Sachdeva, N. and Kumaraguru, P. Call for service:
Characterizing and modeling police response to
serviceable requests on Facebook. In Proceedings of
the ACM Conf. Computer-Supported Cooperative Work
and Social Computing, 2017.
10. Samanta, N., Nangi, S. R., Jagirdar, H., Ganguly, N.,
Charabarti, S. A deep generative model for code
switched text. In Proceedings of IJCAI, 2019.
11. Zafar, M.B., Bhattacharya, P., Ganguly, N., Ghosh,
S. and Gummadi, K.P. On the wisdom of experts
vs. crowds: Discovering trustworthy topical news
in microblogs. In Proceedings of the ACM Conf.
Computer-Supported Cooperative Work and Social
Computing, 2016, 438–451.
Niloy Ganguly ( email@example.com) is a professor at IIT
Ponnurangam Kumaraguru ( firstname.lastname@example.org) is an
associate professor at IIIT Delhi, India.
© 2019 ACM 0001-0792/19/11