We can classify fields in a database as nominal, ordinal, quantitative, or interval. 4, 16 This classification is
the field’s scale. Polaris reduces this categorization to
ordinal and quantitative by treating intervals as quantitative and assigning an ordering to the nominal fields
to treat them as ordinal. A field’s scale affects its visual
representation. Quantitative fields are continuous, and
are shown as axes or smoothly varying values. Ordinal
scales are represented discretely, as headers or different
The fields within a relational table can also be partitioned into two types: dimensions and measures. This
classification is the field’s role. Dimensions and measures are similar to independent and dependent variables in traditional analysis. For example, a product
name or type would be a dimension while the product
price or size would be a measure. The field’s role determines how the query is generated. Measures are computed using an aggregation function and dimensions
form the groups to be aggregated.
Polaris originally treated ordinal fields as dimensions and
quantitative fields as measures. With experience, however,
we have found that a field’s scale and role are orthogonal,
and may change depending on the question. For example,
when asking the question “What is the average age of people
purchasing a product?” the field Age is acting as a measure.
However, when asking the question “What is the average
amount spent classified by customer age?” then Age is acting as a dimension. The current implementation of Tableau
uses simple heuristics based on a field’s data type and
domain cardinality to determine a field’s default role and
scale, but allows both to be easily changed.
To effectively support the analysis process in large multidimensional databases, an analysis tool must meet several
• Exploratory interface: Analysts must be able to rapidly
and incrementally change what data they are viewing
and how they are viewing that data as they explore
• Multiple display types: Analysis consists of different
tasks such as discovering correlations between variables, finding patterns, and locating outliers. An analysis tool must be able to generate displays suited to these
• Data-dense displays: The databases typically contain a
large number of records and dimensions. Analysts need
to be able to create visualizations that will simultaneously
display many dimensions of large subsets of the data.
Polaris addresses these demands by providing an interface for rapidly and incrementally generating table-based
displays. In Polaris, a table consists of a number of rows,
columns, and layers. Each table axis may contain multiple
nested dimensions. Each table entry, or pane, contains a set
of records that are visually encoded as a set of marks to create a graphic.
Several characteristics of tables make them particularly
effective for displaying multidimensional data:
• Multivariate: The table structure can encode multiple
dimensions, enabling the display of high-dimensional
• Comparative: Tables generate small-multiple displays
of information, which are easily compared to expose
patterns and trends across dimensions. 20
• Familiar: Table-based displays have an extensive history. Statisticians are accustomed to using tabular displays of graphs, such as scatterplot matrices and Trellis
displays, for analysis. 2, 7, 20
Figure 1 shows the Polaris user interface. In this example,
the analyst has constructed a matrix of scatterplots showing sales versus profit for different product types in different
quarters. The primary interaction technique is to drag-and-drop fields from the database schema onto shelves throughout the display. We call a given configuration of fields on
shelves a visual specification. The visual specification is
converted into the language using simple operations. The
specification determines the analysis and visualization
operations to be performed by the system, defining:
• The mapping of data sources to layers. Multiple data
sources may be combined in a single Polaris visualization.
Each data source maps to a separate layer or set of layers.
• The number of rows, columns, and layers in the table
and their relative orders (left to right as well as back to
front). The database dimensions assigned to rows are
specified by the fields on the y shelf, columns by fields
on the x shelf, and layers by fields on the layer (z) shelf.
Multiple fields may be dragged onto each shelf to show
• The selection of records from the database and the partitioning of records into different layers and panes.
• The grouping of data within a pane and the computation of statistical properties, aggregates, and other
derived fields. Records may also be sorted into a given
• The type of graphic displayed in each table pane. Each
graphic consists of a set of marks, one mark per record
in that pane.
• The mapping of data fields to retinal properties of the
marks in the graphics. The mappings used for any given
visualization are shown in a set of automatically generated legends.
Analysts can interact with the resulting visualizations in several ways. Each mark represents a tuple, so selecting a single
mark in a graphic by clicking on it pops up a detail window
that displays user-specified field values for the tuples corresponding to that mark. The tuples represented by a set of
marks can be cut and pasted into a spreadsheet by selecting
the marks representing the tuples. Analysts can draw rubber
bands around a set of marks to brush or highlight related
records, either within a single table or between multiple
In Section 3, we describe how the visual specification is used
to generate graphics. In Section 4, we describe the supported
data transformations and how the visual specifications are