tion and get community support.
Moreover, they demonstrated that a
community of low-income, low-literate people can moderate themselves
without any outside support, thereby
addressing the content management
challenge of these voice forums.
The second key challenge in scaling voice forums is the airtime cost.
Often, these services use expensive
toll-free lines to remain accessible
to low-income users. The resultant
cost poses a huge burden to sustainability, often putting these services
at risk of being shut down as the
usage grows. While a few services
sustain themselves through advertisements, grants, and partnerships
with telecoms or governments, these
options are often beyond the reach of
most voice forum providers. To make
these services financially sustainable,
Vashistha et al. examined whether
low-income users of voice forums
could complete useful work on their
mobile phones to offset their participation costs. In 2016, they created
Respeak, the first voice-based crowdsourcing marketplace that pays users
to transcribe audio files vocally.
Respeak sends short audio segments
to multiple voice forum users and
pays them via mobile airtime for
each submitted transcript. Instead of
typing the transcript, users respeak
audio content into an off-the-shelf
speech recognition engine and
submit the autogenerated transcript.
Respeak combines the transcripts for
each segment from multiple users us-
ing sequence-alignment algorithms
to reduce random speech recognition
errors. It then pays users in mobile
airtime based on the accuracy of
transcripts submitted in them. In the
last three years, Respeak has been
used by low-income students, blind
people, and rural residents in India to
produce speech transcriptions with
over 90% accuracy at one-fourth of
the market rate, generating sufficient
profit to subsidize their participation
costs. One minute of crowd work on
Respeak enable users to earn eight
minutes of airtime.
Grand Challenges: Harassment,
Misinformation, and Disinformation
Voice forums, like any other social
platform, come with their own pitfalls.
They end up reflecting the existing
sociocultural norms and values of the
society, including its shortcomings
and biases. For example, while Swara
and Baang served as instruments of
inclusion for low literate, rural, indigenous, and visually impaired communities, they failed to create a welcoming environment for female users.
Women faced systemic discrimination
and harassment in the form of messages that contained abuses, threats,
and flirtatious behavior.
Both mainstream social media
platforms and voice forums face grand
challenges when tackling misinformation, disinformation, harassment, and
abuse. These platforms and forums
differ greatly in terms of scale, features, interfaces, supported languages,
and target users. Consequently, solutions to tackle these challenges on a
Three waves of voice forums in low-resource environments.
2007 2011 2015 2019
Access and Inclusion
• Avaaj Otalo
• CGNet Swara
• Spoken Web
• Ila Dhageyso
Training and Spread
• Sangeet Swara
• Respeak and similar systems
Managing Content and Costs
platform like Facebook might be inef-
fective for voice forums, and vice versa.
This presents interesting research
challenges of identifying indecorous
content in local language audio, filter-
ing out spreaders of disinformation,
and addressing situations where the
collective ignorance of community
members eclipse their collective intel-
ligence. The HCI4D community must
tackle these grand challenges to make
the Internet of the orals more diverse,
inclusive, and impactful.
1. Gulaid, M. and Vashistha, A. Ila Dhageyso: An
interactive voice forum to foster transparent
governance in Somaliland. In Proceedings of the
6th Intern. Conf. Information and Communications
Technologies and Development: Notes, Vol. 2 (Cape
Town, South Africa, 2013), 41–44.
2. Mudliar, P. et al. Emergent practices around CGNet
Swara, voice forum for citizen journalism in rural
India. In Proceedings of the 5th Intern. Conf.
Information and Communication Technologies and
Development (Atlanta, GA, USA, 2012), 159–168.
3. Patel, N. et al. Avaaj Otalo: A field study of an
interactive voice forum for small farmers in rural
India. In Proceedings of the SIGCHI Conf. Human
Factors in Computing Systems (Atlanta, GA, USA,
4. Raza, A.A. et al. Baang: A viral speech-based social
platform for under-connected populations. In
Proceedings of the 2018 CHI Conf. Human Factors
in Computing Systems (Montreal, QC, Canada, 2018),
5. Raza, A.A. et al. Job opportunities through
entertainment: Virally spread speech-based services
for low-literate users. In Proceedings of the SIGCHI
Conf. Human Factors in Computing Systems (Paris,
France, 2013), 2803–2812.
6. Sherwani, J. et al. Healthline: Speech-based access to
health information by low-literate users. Inter. Conf.
Information and Communication Technologies and
Development (Bangalore, India, 2007), 1–9.
7. Vashistha, A. et al. BSpeak: An accessible voice-based
crowdsourcing marketplace for low-income blind
people. In Proceedings of the 2018 CHI Conf. Human
Factors in Computing Systems (Montreal, QC, Canada,
2018), 57:1–57: 13.
8. Vashistha, A. et al. ReCall: Crowdsourcing on basic
phones to financially sustain voice forums. In
Proceedings of the 2019 CHI Conf. Human Factors in
Computing Systems (Glasgow, Scotland, U.K., 2019).
9. Vashistha, A. et al. Respeak: A voice-based, crowd-powered speech transcription system. In Proceedings
of the 2017 CHI Conf. Human Factors in Computing
Systems (Denver, CO, USA, 2017), 1855–1866.
10. Vashistha, A. et al. Sangeet Swara: A community-moderated voice forum in rural India. In Proceedings
of the 33rd Annual ACM Conf. Human Factors in
Computing Systems (Seoul, South Korea, 2015),
11. Vashistha, A. et al. Threats, abuses, flirting, and
blackmail: Gender inequity in social media voice
forums. In Proceedings of the 2019 CHI Conf. Human
Factors in Computing Systems (Glasgow, Scotland,
U. K., 2019).
12. Wolfe, N. et al. Rapid development of public health
education systems in low-literacy multilingual
environments: Combating Ebola through voice
messaging. In Proceedings of the ISCA Special
Interest Group on Speech and Language Technology in
Education (Leipzig, Germany, 2015).
Aditya Vashistha is an assistant professor at Cornell
University, Ithaca, NY, USA.
Umar Saif is UNESCO Chair, ICTD, Lahore, Pakistan.
Agha Ali Raza is an assistant professor at Information
Technology University, Lahore, Pakistan.
© 2019 ACM 0001-0792/19/11