In the 6 years since this work was originally published, we
have gained considerable experience with the formalism
and the interface. In that time, the technology has been commercialized and extended by Tableau Software as Tableau
Desktop and is used by thousands of companies and tens of
thousands of users. The system has also been adapted to the
web, so it is possible to perform analysis within a browser.
The uses are diverse, ranging from disease research in the
jungles of Central America to marketing analysis in Fortune
50 companies to usability analysis by video game designers,
and the data sizes range from small spreadsheets to billions
of rows of data. The many types of users indicate the ubiquity of data and the demand for new tools. This experience
has emphasized three points to us: ( 1) the importance of a
formal approach, ( 2) the importance of an architecture that
leverages database technology rather than replaces it, and
( 3) the importance of building effective defaults into the
One question early on was: “do we need a formalism?”
Most visualization systems have predefined types of charts
and use wizards to help the user construct graphs. Having
a language allows us to generate an unlimited number of
different types of graphics. Restricting the set of views to
a small set limits the power of visualization; this would be
like building into a query language a small set of predefined
queries. Experience has shown that this flexibility makes
it possible to incrementally build new views, which is key
to smoothly supporting the analysis process. Both of these
aspects of Polaris are enabled by the formal nature of the
algebra, where every addition or deletion leads to a new algebraic statement.
The formalism also enables us to unify the specification of the visualization with the database query: users can
change the query used to fetch the data and their view of it
simultaneously. In subsequent work, we have proved that
the language is complete; that is, it is possible to generate
any statement in the relational algebra. A major problem
with many visual interfaces is that they restrict the types of
queries that can be formed.
This unification of visualization and database queries is
also a key architectural decision that makes it possible to
use our system as a front-end to large parallel database servers. This makes it easy to access important data in existing
data sources, to leverage high performance database technology (e.g., database appliances, massively parallel computation, column stores), and to avoid data replication and
application-specific data silos. Why move a terabyte of data
if you don’t have to?
One potential issue with a compositional language is
that it creates a large space of possible visualizations. While
many are effective and aesthetically pleasing, many are not.
Thus, choosing default graphics is an important part of any
production system and allows for additional succinctness in
the language. However, the issue is not just with choosing
default graphics. Generating effective visual mappings (e.g.,
color, shape) by default is not a fundamental aspect of the
language, but is equally important. Effective defaults enable
users to focus on their task and questions rather than the
details of color or shape selection, especially since many
users are not trained as graphic designers or psychologists.
7. ReLateD WoRK
The related work to Polaris can be divided into two categories: formal graphical specifications and database exploration tools.
7. 1. formal graphical specifications
We have built on the work of several researchers’ insights
into the formal properties of graphic communication, such
as Bertin’s Semiology of Graphics, 4 Cleveland’s experimental results on the perception of data, 7, 8 Wilkinson’s formalism for statistical graphics, 22 and Mackinlay’s APT system. 12
However, the Polaris formalism is innovative in several ways.
One key aspect of our approach is that all specifications can
be compiled directly into queries. Existing formalisms do
not consider the generation of queries to be related to the
presentation of information. Another innovation is the use
of an algebra to describe table-based displays. Tables are
particularly effective for displaying multidimensional data,
as multiple dimensions of the data can be explicitly encoded
in the structure of the table. Finally, our formalism is the
basis for several interactive tools for analyzing and exploring large data warehouses and this usage has affected the
development of the formalism.
7. 2. Database exploration tools
The second area of related work is visual query and database exploration tools. Academic projects such as Visage, 14
DEVise, 11 and Tioga- 21 have focused on developing visualization environments that support interactive database exploration through visual queries. Users construct queries and
visualizations through their interactions with the visualization system interface. These systems have flexible mechanisms for mapping query results to graphs, and support
mapping database records to retinal properties. However,
none of these systems is based on an expressive formal language for graphics nor do they leverage table-based organizations of their visualizations.
Finally, existing systems, such as XmdvTool, 21 Spotfire, 15
and XGobi6 have taken the approach of providing a set of
predefined visualizations, such as scatterplots and parallel
coordinates. These views are augmented with interaction
techniques, such as brushing and zooming, which can be
used to refine the queries. We feel that this approach is much
more limiting than providing the user with a set of building
blocks that can be used to interactively construct and refine
a wide range of displays to suit any analysis task.
We have presented Polaris, a visual query language for databases and a graphical interface for authoring queries in the
language. The Polaris formalism uses succinct visual specifications to describe a wide range of table-based visualizations
of multidimensional information. Visual specifications can
be compiled into both the queries and the drawing commands necessary to generate the displays, thus unifying analysis and visualization into a single visual query language.