Computing with CORGIS: Diverse, Real-world Datasets for Introductory Computing
At the end of the course, an anonymized survey was administered to all students as an assignment to gather data on their
experience, in full compliance with our institution’s IRB. Four
students did not consent to their survey results being shared,
and six students failed to complete the survey. This yielded an
80% response rate. Half of the students were male and half were
female. Students came largely from the arts and humanities,
with few students from the sciences. The distribution of years
was skewed heavily towards Sophomores and Juniors, with only
a third of the class being Freshman and Seniors.
Students were asked to rate their agreement with 26
statements on a 7-point likert scale (from “Strongly Disagree” to “Strongly Agree”). The first 25 statements were
the cross-product of two sets of five aspects. First were the
main course components: learning about abstraction, writing
programs, real-world data, social ethics of computing, and
working in small groups (cohorts). The second set were the
elements of the MUSIC model: their belief in whether they
had a choice, their interest, their sense of the usefulness, their
sense of success, and their belief that the instructors cared.
So, an example statement would be “I believe that it was useful to my long-term career goals to learn to write computer
programs”. The last statement, unrelated to the others, was
their intent to continue learning computing, either informally
(e.g., an online course) or formally (e.g., another Computer
Figure 9 presents raw results from the survey. Overall, most
of their responses were encouraging, showing mostly positive
answers (no student marked “Strongly Disagree” for any statement). We found that their interest in learning course components generally outweighed their sense of the usefulness to
learning with respect to their long-term goals. They indicated
that working with data related to their own major to be more
useful for their career goals than learning to write programs.
Students felt empowered and cared for by their instructors.
They also indicated that they felt successful in learning the material. The lukewarm response to the last statement, students’
intent to continue, is expected since it is not the goal of the
course to recruit new students into computing. However, we
found it disappointing that not a single student marked “
Strongly agree” for that element.
not distinguish between different types of string values, such as
unique identifiers, URLs, classification codes, or textual data.
Figure 8 shows a word cloud of all the descriptive tags associated with the CORGIS datasets. This graphic illustrates the range
of datasets associated with the project. This also reveals certain
biases and trends in the selection of datasets. United States, for
instance, is the single largest word in the cloud. Unfortunately,
this graphic is not helpful in finding under-represented career
paths and disciplines, which is a key problem for the project.
CORGIS was used in a 50-student Computational Thinking
course for non-majors. Students took the course to satisfy a
breadth requirement and none had significant prior computing
experience. Students completed instructor-assigned practice
problems and an open-ended final project where they chose
their own dataset from the CORGIS collection.
Figure 8: CORGIS datasets word cloud of tags
Figure 6: Distribution of structure and characteristics of CORIGS datasets
Figure 7: Distribution of types in CORGIS datasets