encoded as a visual property. The system should then generate an effective mapping from the domain of the field to
the range of the visual property. While generating effective
default mappings is not a fundamental aspect of the language, it has proven to be important and is discussed in
more detail in Section 7. Here, we briefly discuss how effective mappings are generated for each retinal property. The
default mappings are illustrated in Figure 4.
shape: Polaris uses the set of shapes recommended by
Cleveland for encoding ordinal data. 7 We have extended this
set of shapes to include several additional shapes to allow a
larger domain of values to be encoded.
size: Analysts can use size to encode either an ordinal or
quantitative field. When encoding a quantitative domain as
size, a linear map from the field domain to the area of the
mark is created. The minimum size is chosen so that all
visual properties of a mark with the minimum size can be
perceived. 10 If an ordinal field is encoded as size, the domain
needs to be small, at most four or five values, so that the analyst can discriminate between different categories. 4
orientation: A key principle in generating mappings of ordinal fields to orientation is that the orientation needs to vary
by at least 30° between categories, 10 thus constraining the
automatically generated mapping to a domain of at most six
categories. For quantitative fields, the orientation varies linearly with the domain of the field.
color: When encoding an ordinal domain, we use a predefined palette to select the color for each domain entry. The
colors in the palette are well separated in the color spectrum,
predominantly on hue. 19 We have ordered the colors to avoid
adjacent colors with different brightness or substantially
different wavelengths in an attempt to include harmonious
sets of colors in each palette. 4, 10, 19 We additionally reserve a
saturated red for highlighting items that have been selected
or brushed.
When encoding a quantitative variable, it is important to
vary only one psychophysical variable, such as hue or value.
The default palette we originally used for encoding quantitative data was the isomorphic colormap developed by
Rogowitz. 13 We have since had palettes hand-designed by a
color expert in the HSV space that balance perceptual properties with aesthetics.
figure 4: the different retinal properties that can be used to encode
fields of the data and examples of the default mappings that are
generated when a given type of data field is encoded in each of the
retinal properties.
4. Data tRansfoRmations, VisuaL QueRies, anD
GeneRatinG DataBase QueRies
An important aspect of the Polaris formalism is the unification of graphics and data transformations. A single visual
specification must completely specify both the data retrieval
and the data presentation. Thus, the formalism must support the complete range of data transformations possible in
a query language such as SQL, 17 including the common relational operators: selection, filtering, grouping and aggregation, and sorting. It can be shown that any query expressible
in SQL can be expressed as a specification in the Polaris
formalism. 17
The Polaris interface exposes all capabilities of the
underlying database query language: the state of the interface generates both a visual specification and a statement in
the visual query language. All fields on shelves are inserted
into a select statement. Measure fields are aggregated while
dimension fields are inserted into a GROUPBY statement,
with additional dimension fields specified in a Level-of-Detail shelf. Each dimension can also be sorted, and different aggregation functions can be associated with each
measure; these options are chosen by drop-down menus on
each field on the shelf. There is also a filter shelf that represents items in the WHERE clause. Finally, dialog boxes
expose general calculations and joins.
Figure 5 shows the overall data flow in Polaris.
step 1: selecting the records: The first phase of the data flow
retrieves records from the database, applying user-defined
filters to select subsets of the database and computing any
user-defined calculations.
For an ordinal field A, the user may specify a subset of the
domain of the field as valid. If filter(A) is the user-selected subset, then a relational predicate expressing the filter for A is:
A in filter(A)
For a quantitative field P, the user may define a subset of the
field’s domain as valid. If min(P) and max(P) are the user-defined extents of this subset, then a relational predicate
expressing the filter for P is:
(P ≥ min(P)and P ≤ max(P))
We can define the relational predicate filters as the conjunction of all of the individual field filters. Then, the first stage
of the data transformation network is equivalent to the SQL
statement:
SELECT
WHERE {filters}
It is possible within the complete formalism to define more
sophisticated filtering, such as filters on the cross-product
of multiple fields or filters with ordering dependencies (
filter A is computed relative to filter B).
step 2: Partitioning the records into Panes: The second