i nterview
other than tables, such as objects or XML. Some readers
might recognize him from his brief stint as the “Head in
the Box” on Microsoft’s VBTV.
Blakeley is lead architect in the SQL server engine
working on server-side programmability, scale-out query
processing, and object-relational technologies. He joined
Microsoft in 1994 and since then has been an architect of
several of Microsoft’s data-access technologies. Like Meijer, Blakeley’s main focus is on the impedance-mismatch
problem. He served as the lead architect for the ADO
.NET Entity Framework, which works with LINQ to raise
the level of abstraction and simplify data programming.
Before joining Microsoft, Blakeley was a member of the
technical staff at Texas Instruments, where he developed
the Open OODB (object-oriented database) management
system for DARPA.
TERRY COATTA I think that a number of our Queue read-
ers really don’t understand how LINQ differs from what
developers have been doing up until now. A lot of people
are embedding SQL queries directly into their application
code. Can you tell us what’s different about LINQ and
why developers should care about it?
ERIK MEIJER While, superficially, LINQ might look like
you’re embedding SQL queries in your code, actually it’s
radically different. What LINQ really does is allow you to
query arbitrary collections, which could be tables, objects,
in-memory objects, or XML.
The secret sauce behind LINQ is what we call the standard query operators. If you have a data source on which
you can define these standard query operators, then you
can query it using LINQ. Think about it like this: you
have SQL, which is based on relational algebra. Now
imagine that we abstract from the relational part and
have a query algebra that is represented by these standard
query operators.
So, now the languages have this query syntax that
they translate into calls to these query operators, but each
data source can give a different implementation of these
operators, allowing you to query over a wide range of
data sources. One example is querying over tables, but it’s
very different from using SQL.
The other problem with embedding SQL inside the
language is that it’s not really well integrated, so SQL is
embedded but it has a different syntax and type system.
You’ll have variables in your program you want to use
inside the embedded SQL, and maybe the other way
around, because you’ll want to use the results that you
get back from the query in your program. There’s usually
quite a nasty boundary between the two worlds, but with
LINQ it’s completely integrated into the language, so the
type system—everything—is that of the host language.
It’s a seamless experience.
Those are the two big differences. The fact that it’s
integrated in the language means that you get Intelli-Sense, compile-time type checking—all those things that
are much harder when you are embedding SQL inside the
language.
TC You talked about an abstract query algebra. Is it a
stretch to apply that algebra to different data sources such
as relational tables and XML, or do you think that it’s a
pretty natural match?
EM I think it’s a pretty natural match. Let’s take the two
concrete ones—SQL and XQuery. The syntax is slightly
different. In SQL you start with SELECT and then you do a
FROM and a WHERE, and in XQuery you start with a FROM
and end with RETURN instead of SELECT. But conceptually
you’re doing exactly the same thing.
JOSÉ BLAKELEY If I may add a little, at the outside
level, it’s the same type of collections-based operation.
If there is a difference, it would be that in the relational
world you would be going over collections of records. In
the XML world you would be going over collections of
objects that model the infosets for XML. It’s a collection
of elements, a collection of attributes, and those kinds of
things. In essence, you are performing operations over
collections of things.
TC I would like to step sideways for a moment and talk
about the Entity Framework. A lot of systems are built
where the developer issues an SQL query, brings some
results into a data table or dataset, and binds the result
of that into UI components. What’s different about the
Entity Framework? Why should developers care about it?
JB A very common problem that application developers face is what has been called impedance mismatch.
Usually the data model used by the application side of
the problem is richer and more directly associated with
the business problem at hand. The vocabulary is very
closely related to the needs of the business—your customers, your line items, your products, and your inventory—whereas in the database you’re dealing with similar
concepts, but the artifacts of database normalization and
schema design get in the way of what the applications
really need.
All of a sudden, database concerns start being inserted.
These are foreign to the application, so every single application has to have a layer that bridges the gap between
the world of data and the world of the program. Tons of
application developers see that mapping, if you will, as
part of the application.