Austin Cory Bart, Ryan Whitcomb, Dennis Kafura, Clifford A. Shaffer, and Eli Tilevich
Virginia Tech
Computing with
CORGIS: Diverse,
Real-world Datasets
for Introductory
Computing
REPRINT
From Proceedings of 2017 ACM SIGCSE Conference. Reprinted with permission
To successfully bring introductory computing to non-CS majors, one needs to create a curriculum that will appeal
to students from diverse disciplines. Several educational
theories emphasize the need for introductory contexts that
align with students’ long-term goals and are perceived as
useful. Data Science, using algorithms to manipulate real-world data and interpreting the results, has emerged as a
field with cross-disciplinary value, and has strong potential
as an appealing context for introductory computing courses.
However, it is not easy to find, clean, and integrate datasets
that will satisfy a broad variety of learners. The CORGIS
project ( https://think.cs.vt.edu/corgis) enables instructors
to easily incorporate data science into their classroom.
Specifically, it provides over 40 datasets in areas including
history, politics, medicine, and education. Additionally,
the CORGIS infrastructure supports the integration of new
datasets with simple libraries for Java, Python, and Racket,
thus empowering introductory students to write programs
that manipulate real data. Finally, the CORGIS web-based
tools allow learners to visualize and explore datasets without
programming, enabling data science lessons on day one. We
have incorporated CORGIS assignments into an introductory
course for non-majors to study their impact on learners’
motivation, with positive initial results. These results indicate
that external adopters are likely to find the CORGIS tools and
materials useful in their own pedagogical pursuits.
INTRODUCTION
As computing skills become required for an increasing number
of disciplines, CS educators are meeting the demand by creating courses for non-CS majors. Students are drawn from many
disciplines, including the sciences, arts, and humanities, and
they have myriad, divergent career paths before them distinct
from those of Computer Science majors. This different population needs a different approach, one that all students will perceive as authentic and beneficial, while simultaneously enabling
problems of a computational nature [ 9].
We suggest that most students would benefit from computational techniques that empower them to manage the ev-er-growing amounts of data present in every field [ 14]. Therefore, a Data Science context, where students write algorithms
to manipulate real-world data and interpret the results, could
provide a compelling context. However, integrating data into