Consider this simple LINQ query:
from user in users
where user.Id == 1
select user;
Though this is probably the very
simplest LINQ query possible, the
compiler translates it to something
that looks drastically different and
quite complex, as shown in Figure 2.
This is one of the more problematic
aspects of writing a LINQ provider. You
get something that does not look very
much like what the user provided.
If you are used to writing compilers, this all makes perfect sense.
When you have more complex queries, however, even queries that do not
seem much different on the surface,
the result is drastically different and
far more complex. The following query produces an expression tree that
goes on for five pages:
from user in users
where user.IsActive
from task in user.Tasks
where
task.Project.Owners.Contains(user) &&
task.DueDate <
DateTime.Now &&
task.Completed == false
select new { task.Title,
task.Status, user.UserName };
The first query is three lines and
has an expression tree that goes on
for about a page. The second is five
lines, and has an expression that
goes on for five pages. You can see
the full expression trees for both que-
ries in the files, http://queue.acm.
org/downloads/2011/Trivial_Query.
html and http://queue.acm.org/down-
loads/2011/ Non_Trivial_Query.html,
respectively. For compiler writers, this
just makes sense. For developers who
aren’t well versed in compiler theory,
this is a big stumbling block.
figure 3. trivial query ast, as generated by nRefactory.
• FromClause=QueryExpressionFromClause
o Identifier=user
o InExpression
• IdentifierExpression Identifier=users
o MiddleClauses
• QueryExpression WhereClause Condition
•BinaryOperatorExpression
• Left
• MemberReferenceExpression
• TargetObject Identifier=user
• MemberName=Id
• Op=Equality
• Right
• PrimitiveExpression
• Value= 1
o SelectOrGroupClause
• QueryExpressionSelectClause Projection
•IdentifierExpression Identifier=user
net/ NRefactory.ashx) to process it.
Working with something like this
is drastically simpler than working
with the compiler output, because we
now have the relevant meaning. As it
stands, we have to extract the meaning
ourselves, which is not trivial.
There are two lessons to take from
this. The first is that giving developers
the raw compiler output as an expression tree is a fairly poor choice from
the point of view of the developers
needing to build LINQ providers on
top of those. Even fairly minimal work
(such as seen in Figure 3) would have
made creating LINQ providers drastically easier. It isn’t that I think that
you could make it work for all cases; I
believe that you would still have this
complexity in some cases, if only because we need to support arbitrarily
complex inputs.
The vast majority of cases, however, falls into fairly predictable patterns, and solving this problem at the
compiler/framework level would have
made the cases of complex expression
trees rare rather than everyday occurrences. That is why I strongly encourage the use of third-party libraries,
such as the re-linq project, that take
on a lot of the work of putting the common transformations into a more reasonable format.
There are other issues in implementing a proper LINQ provider:
˲ ˲Different languages provide different expression trees for the same
logical code. For example, equality is
handled differently in C# and VB, depending on whether you are comparing to a string or not.
˲ ˲ Different versions of the compiler
can emit different expression trees for
the same queries.
˲ ˲ There is absolutely no documentation about expression trees that
helps to build a LINQ provider.
˲ ˲ Your input is basically any legal
expression in C# or VB.Net, which
means that the option space you have
to deal with is huge. You can throw
an exception if you don’t understand
the query, of course, but that means
that users will discover unsupported
scenarios only at runtime, defeating
much of the point of having queries
integrated into the language.
To make matters worse, users often
come up with the most complex que-