Computing with CORGIS: Diverse, Real-world Datasets for Introductory Computing
Visualizer: The Visualizer allows students to engage with
datasets in their browser without programming, as shown
in Figure 4. Users can generate histograms, line plots, scatter
plots, and bar charts for any supported dataset. Bar charts are
categorized by the indexes from the specification file. Datasets
can also be filtered on indexed fields when generating charts.
Although not as powerful as a full programming environment,
the Visualizer empowers students to work with real-world data
without specialized training.
Explorer: Separate from data content, students can explore
the structure of a dataset using the Explorer. The hierarchy of
data is explored through chained windows. Clicking a field in a
map, for instance, might open a new map’s window, which in
turn might have atomic fields or links to further maps.
Raw Data Files: Language bindings simplify access to datasets, but some instructors may want students to use more traditional methods. Therefore, we expose datasets in popular raw
formats: JSON, SQL, and CSV. The SQL database represents
each child map as a distinct table, linked with a primary key.
The CSV table is a flattened representation of the data, using
the key hierarchy to disambiguate column names.
The CORGIS Gallery: Language bindings, data files, and
the Visualizer are all accessible through a user-friendly Gallery,
shown in Figure 5. Every dataset is documented with a list of
content tags to help students search for a relevant dataset.
CORGIS can be integrated into introductory courses in a number of ways. In this section, we describe how we have used
the tools in an existing course on Computational Thinking for
non-majors. This course teaches basic programming in a Data
Science context. Much of the CORGIS development has been
driven by the needs of the course.
The major motivational goal of integrating CORGIS is to
align assignments with students’ career goals. However, assign-
GIS datasets have interfaces that can be used to slice the data.
The weather dataset, for example, provides “Get Temperature”
and “Get Past Temperatures” that return an integer and a list of
integers, respectively. These specification files can be validated
by the system using a custom compiler.
The CORGIS infrastructure uses the specification file and
associated dataset to automatically generate a number of student-ready materials.
Language Libraries: The CORGIS project currently supports the generation of Python, Java, and Racket libraries. Figure 3 gives examples of how a generated library can be used
in these languages. Code and supporting documentation for a
library are generated by filling out templates using the Jinja2
templating library. Interfaces from the specification file generate functions that execute queries. A SQLite database is also
generated to store the data, rather than the original JSON file,
for performance reasons.
Figure 4: The CORGIS Visualizer
Figure 2: Partial data map for the Classics library
Figure 3: Example of using a CORGIS library