define a table entry to be reachable from a set of previously
reachable strings if the entry can be generated from the
set of reachable strings using the Generate procedure of
Section 3. 2. The rest of the reachability algorithm operates
just as before.
Procedure Intersect: A basic ingredient of the
Intersect procedure for syntactic transformations is a
method to intersect two Dag constructs, each representing a
set of trace expressions. We replace this by a method to intersect two tuples (h~ 1, ht 1, Progs1) and (h~ 2, ht 2, Progs2), each
representing a set of extended trace expressions. The
tuple after intersection is (h~ 1 ´ h~ 2, (ht 1,ht 2), Progs12), where
Progs12((h~ 1, h~ 2) ) is given by the intersection of Progs1(h~ 1)
and Progs2(h~ 2).
ranking: We prefer expressions of smaller depth (fewer
nested chains of Select expressions) and ones that match
longer strings in table entries for indexing. We prefer lookup
expressions that use distinct tables (for join queries) as
opposed to using the same table twice. We prefer conditionals with fewer predicates. We prefer predicates that compare columns with other table entries or input variables (as
opposed to comparing columns with constant strings).
We implemented our algorithm as an extension to the
Excel add-in (Section 3. 2) and evaluated it successfully over
more than 50 benchmark problems obtained from various
help forums and the Excel product team. For each benchmark, our implementation learned the desired transformation in < 10 s (88% of them taking < 1 s each) requiring at most
three input–output examples (with 70% of them requiring
only one example). The data structure had size between 100
and 2,000 (measured as the number of terminal symbols in
the data-structure syntax), with an average size of 600, and
typically represented 1020 expressions.
5. tABLE LAyout tRAnSFoRMAtionS
End users often transform a spreadsheet table not by changing the data stored in the cells of a table, but instead by
changing how the cells are grouped or arranged. In other
words, users often transform the layout of a table. 8
Example 5. The following example input table and subsequent
example output table were provided by a novice on an Excel
user help thread to specify a layout transformation:
Andrew
Ben
Carl
Qual 1
01.02.2003
31.08.2001
Qual 2
27.06.2008
18.04.2003
Qual 3
06.04.2007
05.07.2004
09.12.2009
Andrew
Andrew
Andrew
Ben
Ben
Carl
Carl
Qual 1
01.02.2003
27.06.2008
06.04.2007
31.08.2001
05.07.2004
18.04.2003
09.12.2009
The example input contains a set of dates on which tests were
given, where each date is in a row corresponding to the name
of the test taker, and in a column corresponding to the name of
the test. For every date, the user needs to produce a row in the
output table containing the name of the test taker, the name of
the test, and the date on which the test was taken. If a date cell
in the input is empty, then no corresponding row should be pro-
duced in the output.
5. 1. Domain-specific language
We may view every program P that transforms the layout of
a table as constructing a map mP from the cells of an input
table to the coordinates of the output table. For a cell c in an
input table, if mP(c) = (row, col), P fills the cell in the output
table at the coordinate (row, col) with the data in c. A program
in our language of layout transformations is defined syntactically as a finite collection of component programs, each of
which builds a map from input cells to output coordinates
(Figure 4: table program). We designed our language on the
principle that most layout transformations can be implemented by a set of component programs that construct their
map using one of the two complementary procedures:
filtering and associating.
When a component program filters, it scans the cells
of the input table in row-major order, selects a subset of the cells, and maps them in order to a subrange
of the coordinates in the output table. A filter program
Filter(j, SEQi,j,k) (Figure 4: filter program) is a mapping
condition j, which is a function whose body is a conjunction
of predicates over input cells drawn from a fixed set and an
output sequencer SEQi,j,k, where i, j, and k are nonnegative
integers. For a mapping condition j and sequencer SEQi,j,k,
the filter program Filter(j, SEQi,j,k) scans an input table
and maps each cell that satisfies j to the coordinates in the
output table between columns i and j, starting at row k, in
row-major order.
For the tables in Example 5, the filter program F1 =
Filter(λc.(c.data ≠ “” ∧ c.col ≠ 1 ∧ c.row ≠ 1), SEQ3, 3, 1)
maps each date, that is, each nonempty cell not in column 1
and not in row 1, to its corresponding cell in column 3 of the
output table, starting at row 1. Call this map mF1.
A table program can also construct a map using spatial
relationships between cells in the input table and spatial
relationships between coordinates in the output table; we
call this construction association. When a table program
associates, it takes a cell c in the input table mapped by
some filter program F, picks a cell c1 in the input table whose
coordinate is related to c, finds the coordinate mF(c) that c
maps to under mF, picks a coordinate d1 whose coordinate is
related to mF(c), and maps c1 to d1.
An associative program A = Assoc(F, s0, s1) (Figure 4:
Associative program) is constructed from a filter program
F and two spatial functions s0 and s1, each of which may be of
Figure 4. Syntax of layout transformations.
table program P := TabProg({ki}i)
Component program k := f | a
filter program f := Filter(ϕ, seQi, j, k)
associative program a := Assoc(f, s1, s2)
spatial function s := RelColi | RelRowj