figure 5: the transformations and data flow within Polaris. the visual specification generates queries to the database to select subsets of the
data for analysis, then to filter, sort, and group the results into panes, and then finally to group, sort and aggregate the data within panes.
phase of the data flow partitions the retrieved records into
groups corresponding to each pane in the table. As we discussed in Section 3. 1, the normalized set form of the table axis
expressions determines the table configuration. The table is
partitioned into rows, columns, and layers corresponding to
the entries in these sets.
The ordinal values in each set entry define the criteria by
which records will be sorted into each row, column, and layer.
Let Row(i ) be the predicate that represents the selection criteria for the ith row, Column( j ) be the predicate for the jth column, and Layer(k) the predicate for the kth layer. For example,
if the y-axis of the table is defined by the normalized set:
{abP, abP, abP, abP}
11 12 21 22
then there are four rows in the table, each defined by an
entry in this set, and Row would be defined as:
Row( 1) = (A = a and B = b )
11
Row( 2) = (A = a and B = b )
12
Row( 3) = (A = a and B = b )
21
Row( 4) = (A = a and B = b )
22
Given these definitions, the records to be partitioned into the
pane at the intersection of the ith row, the jth column, and
the kth layer can be retrieved with the following query:
SELECT
WHERE {Row(i) and Column(j) and
Layer(k)}
To generate the groups of records corresponding to each
of the panes, we must iterate over the table executing this
SELECT statement for each pane, which is clearly nonopti-mal. Various optimizations are discussed in. 17
step 3: transforming records within the Panes: The last
phase of the data flow is the transformation of the records
in each pane. If the visual specification includes aggregation, then each measure in the database schema must
be assigned an aggregation operator. If the user has not
selected an aggregation operator for a measure, that measure is assigned the default aggregation operator (SUM).
We define the term aggregates as the list of the aggregations that need to be computed. For example, if the
database contains the quantitative fields Profit, Sales,
and Payroll, and the user has explicitly specified that the
average of Sales should be computed, then aggregates is
defined as:
aggregates =
SUM(Profit),AVG(Sales),SUM(Payroll)
Aggregate field filters (for example, SUM(Profit) > 500)
could not be evaluated in Step 1 with all of the other filters
because the aggregates had not yet been computed. Thus,
those filters must be applied in this phase. We define the
relational predicate filters as in Step 1 for unaggregated
fields.
Additionally, we define the following lists:
G: the field names in the grouping shelf,
S: the field names in the sorting shelf, and
dim: the dimensions in the database.
The necessary transformation can then be expressed by the
SQL statement:
SELECT {dim},{aggregates}
GROUP BY {G}
HAVING {filters}
ORDER BY {S}
If no aggregate fields are included in the visual specification,
then the remaining transformation simply sorts the records
into drawing order:
SELECT *
ORDER BY {S}