atomic expr f :=
select expr et :=
boolean condition g :=
predicate h :=
SubStr(et, p1, p2) | ConstStr(s) | et
υi | Select(Col, Tab, g)
h1 ∧ . . . ∧ hn
Col = s | Col = e
~et :=
~f :=
b :=
g :=
h :=
(a)
( h, ht, Progs) where progs : h→ 2f
υi | Select(Col, Tab, b)
{~ gi} i
h 1∧...∧h n
Col = s | Col = η | Col = {s, η}
(b)
The expression f4 looks up the Id of the item (present in
v1) from the MarkupRec table and f5 generates a substring
of the date present in v2, which are then used together to
lookup the Price of the item from the CostRec table (f1).
The expression f6 looks up the Markup percentage of the
item from the MarkupRec table and f2 generates a substring of this lookup value by extracting the first numeric
token (thus removing the sign). Similarly, the expression f3 generates a substring of f1, removing the $ symbol.
Finally, the top-level expression concatenates the strings
obtained from expressions f1, f2, and f3 with constant strings
“+0.” and “*”.
This extended language also enables manipulation of
strings that represent standard data types, whose semantic meaning can be encoded as a database of relational
tables. For example, consider the following date manipulation task.
Example 4 (Date Manipulation). An Excel user wanted
to convert dates from one format to another, and the fixed set
of hard-coded date formats supported by Excel 2010 do not
match the input and output formats. Thus, the user solicited
help on a forum.
input v1
6-3-2008
3-26-2010
8-1-2009
9-24-2007
output
Jun 3rd, 2008
Mar 26th, 2010
Aug 1st, 2009
Sep 24th, 2007
We can encode the required background knowledge for
the date data type in two tables, namely a Month table
with 12 entries: ( 1, January), …, ( 12, December) and a
DateOrd table with 31 entries ( 1, st), ( 2, nd), …, ( 31, st).
The desired transformation is represented in our language as
Concatenate(SubStr(Select(M W, Month, MN = e1),
Pos(Start Tok, e, 1), CPos( 3) ), ConstStr(“ ”), e2,
Select(Ord, DateOrd, Num = e2), ConstStr(“, ”), e3)
where e1 = SubStr2(v1, NumTok, 1), e2 = SubStr2(v1,
NumTok, 2), e3 = SubStr2(v1, NumTok, 3). (MW,MN) and
(Num,Ord) denote the columns of the Month and DateOrd
tables, respectively.
4. 2. Synthesis algorithm
We now describe the key extensions to the synthesis algorithm for syntactic transformations (Section 3. 2) to obtain
the synthesis algorithm for semantic transformations.
data structure: Figure 3(b) describes the data structure that
succinctly represents the large set of programs in the semantic
transformation language that are consistent with a given input–
output example. The data structure consists of a generalized
expression e t, generalized Boolean condition g~, and general-izedpredicateh (
which,respectively,denoteasetofselectexpres-sions, a set of Boolean conditions g, and a set of predicates h).
The generalized expression e t is represented using a tuple (h~, ht,
Progs) where h~ denotes a set of nodes containing a distinct target node ht (representing the output string), and Progs : h~ ® 2f
maps each node h Î h~ to a set consisting of input variables vi or
generalized select expressions Select(Col, Tab, B). A generalized Boolean condition g i corresponds to some primary key of
table T and is a conjunction of generalized predicates h j, where
each h j is an equality comparison of the jth column of the corresponding primary key with a constant string s or some node h~
or both. The two key aspects of this data structure are (i) the use
of intermediate nodes for sharing sub-expressions to represent
an exponential number of expressions in polynomial space and
(ii) the use of Conjunctive Normal Form (CNF) form of Boolean
conditions to represent an exponential number of conditionals
g in polynomial space.
Procedure Generate: We first consider the simpler
case where there are no syntactic manipulations on the
table lookups and the lookups are performed using exact
string matches, that is, the predicate h is Col = et instead
of Col = e. The Generate procedure operates by iteratively
computing a set of nodes (h~), where each node h Î h~ represents a string val(h) that corresponds to some table entry
or an input string. Generate performs an iterative forward
reachability analysis of the string values that can be generated in a single step (i.e., using a single Select expression)
from the string values computed in the previous step based
on string equality and assigns the Select expression to the
Progs map of the corresponding node. The base case of the
procedure creates a node for each of the input string variables. After performing the analysis for a bounded number
of iterations, the procedure returns the set of select expressions of the node that corresponds to the output string s,
that is, Progs[val− 1(s)].
The Generate procedure for the general case, which
also includes syntactic manipulations on table lookups,
requires a relaxation of the above-mentioned reachability
criterion of strings that is based on string equality. We now