of open response items: Programming
assessment as a case study. In
Proceedings of the Workshop on Data
Driven Education, 2013.
2. Gupta R.R. et al. DeepFix: Fixing
common C language errors by deep
learning. In AAAI 2017.
3. Shashidhar, V., Pandey, N., and
Aggar wal, V. Spoken English grading:
Machine learning with crowd
intelligence. In Proceedings of the
21st ACM SIGKDD Intern. Conf.
Knowledge Discovery and Data Mining,
KDD ’ 15.
4. Shashidhar, V., Pandey, N., and
Aggar wal, V. Automatic spontaneous
speech grading: A novel feature
derivation technique using the
crowd. In Proceedings of the 53rd
Annual Meeting of the Association of
Computational Linguistics and the 7th
Intern. Joint Conf. Natural Language
Processing.
5. Singh, B. P. and Aggar wal, V. Apps to
measure motor skills of vocational
workers. In Proceedings of the 2016
ACM Intern. Joint Conf. Pervasive and
Ubiquitous Computing.
6. Singh, G., Srikant, S., and Aggarwal,
V. Question independent grading
using machine learning: The case
of computer program grading. In
Proceedings of the 22nd ACM SIGKDD
Intern. Conf. Knowledge Discovery and
Data Mining, 2016.
7. Singh, R., Gulwani, S., and Solar-Lezama, A. Automated feedback
generation for introductory
programming assignments. In
Proceedings of the 34th ACM SIGPLAN
Conf. Programming Language Design
and Implementation, 2013.
8. Srikant, S. and Aggarwal, V. A system
to grade computer programming
skills using machine learning. In
Proceedings of the 20th ACM SIGKDD
Intern. Conf. on Knowledge Discovery
and Data Mining, 2014
9. Takhar, R. and Aggar wal, V.
Grading uncompilable programs.
In Proceedings of the Innovative
Applications of Artificial Intelligence
Conf. Assoc. Advancement of Artificial
Intelligence, 2019.
Shashank Srikant is a Ph.D. candidate at
Massachusetts Institute of Technology,
Cambridge, MA, USA.
Rohit Takhar is a research engineer at
Aspiring Minds, Gurugram, India.
Vishal Venugopal is a senior software
engineer at Aspiring Minds, Gurugram,
India.
Varun Aggarwal is co-founder and Chief
Technology Officer of Aspiring Minds,
Gurugram, India.
© 2019 ACM 0001-0782/19/11
exist solutions and products
that evaluate language skills
subjectively, most solutions
provided by established,
international educational
testing and assessment organizations focus on testing
general aptitude skills and
adopt traditional testing
formats like MCQs.
We illustrate the broad
industry verticals we have
developed tools for, each
highlighting a research
problem it addresses and
the associated innovative
intervention we devised.
˲ Programming and software engineering.
Automata6, 8, 9 uses ML models to
automatically score computer programs on parameters
such as functional correctness, complexity, and style.
These models use intelligent
features extracted from
programs, which can signal
correctness even when they
fail to compile. Importantly,
we designed them to be
independent of the task the
program solves, thus allowing to scale assessments to
a wide variety of questions.
There have been attempts by
other research groups2, 7 at
analyzing programs solving
introductory programming
problems. They, however, focus on providing automated
feedback. Our work differs
in that we focus on grading programs on a rubric.
To achieve this, we extract
key data flow properties in
programs that capture their
meaning and use them as
features in an ML model;
the problems we model are
significantly more involved
than introductory problems and exist in multiple
languages.
˲ Customer service. The
IT-enabled services (ITeS)
market in India employs
four million people and is
a US$181-billion industry.
Spoken English skills are
central to this industry.
SVAR3, 4 evaluates speaking
skills at scale. Applicants call
a phone number, have a conversation with an automated
interactive system, and on
hanging up, receive a score
on their spoken skills such
as pronunciation and fluency. It draws from speech
and signal processing
technologies and uses ML to
predict these scores. To reduce evaluation time, and to
improve model accuracy, we
innovated by crowdsourcing
parts of our feature extraction and model evaluation.
˲ Blue-collar jobs.
Four-and-a-half million employees in India are estimated to
be employed in blue-collar
jobs. However, no automated means existed to assess
motor skills, a key requirement in these jobs. Akin to
how computers serve as a
medium to test cognitive
skills, we showed how touch
devices can be used to assess
motor skills.
5 This requires
a person to use their fingers
and wrists to play specific
games designed for tablet
apps. We have shown their
performance on these tasks
to correlate with on-job performance.
˲ Professional communication. Email correspondence
has become an integral part
of the communication tool
chain in any organization.
To test professionals’ email
writing skills, we employ
deep learning and NLP to
assess various aspects like
grammar, content, and
structure.
To our knowledge, this is
the first attempt at designing and productizing such
ML-driven technologies to
assess these specific skills.
˲ Domain knowledge. In
consultation with subject-
matter and industry experts,
we have designed 300+
tests for domain knowledge
across various industry ver-
ticals such as IT, ITeS, retail,
manufacturing, BFSI, hospi-
tality, and telecom. Backed
by statistical techniques
such as item response
theory, these tests provide
standardized assessments
in specific topics, helping
create a level playing field for
job applicants.
Over the years, we have
gathered a database of
applicants’ performance
in the various verticals
discussed here. This has
helped us quantify the state
of employability in India,
and study a year-on-year
change in employability
conditions. Since 2010, Aspiring Minds has released
annual National Employability Reports, which
have now become the gold
standard for tracking the
quality of higher education
in India, aiding and informing policy formulation.
Besides these opportunities, we have also identified
a number of challenges
in using CS/ML for grading. These include issues
around quality of labels (
expert grades), low sample sizes, sample characteristics,
standards for acceptable
errors in models, among
others. Several key issues
are in developing models
that are causal and addressing issues of fairness and
bias in grading. These form
areas of active research.
References
1. Aggar wal, V., Srikant, S., and
Shashidhar, V. Principles for using
machine learning in the assessment
To our knowledge, this is the
first attempt at designing and
productizing such ML-driven
technologies to assess these
specific skills.