i nterview

other than tables, such as objects or XML. Some readers might recognize him from his brief stint as the “Head in the Box” on Microsoft’s VBTV.

Blakeley is lead architect in the SQL server engine working on server-side programmability, scale-out query processing, and object-relational technologies. He joined Microsoft in 1994 and since then has been an architect of several of Microsoft’s data-access technologies. Like Meijer, Blakeley’s main focus is on the impedance-mismatch problem. He served as the lead architect for the ADO .NET Entity Framework, which works with LINQ to raise the level of abstraction and simplify data programming. Before joining Microsoft, Blakeley was a member of the technical staff at Texas Instruments, where he developed the Open OODB (object-oriented database) management system for DARPA.

 

TERRY COATTA I think that a number of our Queue read- ers really don’t understand how LINQ differs from what developers have been doing up until now. A lot of people are embedding SQL queries directly into their application code. Can you tell us what’s different about LINQ and why developers should care about it?

ERIK MEIJER While, superficially, LINQ might look like you’re embedding SQL queries in your code, actually it’s radically different. What LINQ really does is allow you to query arbitrary collections, which could be tables, objects, in-memory objects, or XML.

The secret sauce behind LINQ is what we call the standard query operators. If you have a data source on which you can define these standard query operators, then you can query it using LINQ. Think about it like this: you have SQL, which is based on relational algebra. Now imagine that we abstract from the relational part and have a query algebra that is represented by these standard query operators.

So, now the languages have this query syntax that they translate into calls to these query operators, but each data source can give a different implementation of these operators, allowing you to query over a wide range of data sources. One example is querying over tables, but it’s very different from using SQL.

The other problem with embedding SQL inside the language is that it’s not really well integrated, so SQL is embedded but it has a different syntax and type system. You’ll have variables in your program you want to use inside the embedded SQL, and maybe the other way around, because you’ll want to use the results that you get back from the query in your program. There’s usually quite a nasty boundary between the two worlds, but with

LINQ it’s completely integrated into the language, so the type system—everything—is that of the host language. It’s a seamless experience.

Those are the two big differences. The fact that it’s integrated in the language means that you get Intelli-Sense, compile-time type checking—all those things that are much harder when you are embedding SQL inside the language.

TC You talked about an abstract query algebra. Is it a stretch to apply that algebra to different data sources such as relational tables and XML, or do you think that it’s a pretty natural match?

EM I think it’s a pretty natural match. Let’s take the two concrete ones—SQL and XQuery. The syntax is slightly different. In SQL you start with SELECT and then you do a FROM and a WHERE, and in XQuery you start with a FROM and end with RETURN instead of SELECT. But conceptually you’re doing exactly the same thing.

JOSÉ BLAKELEY If I may add a little, at the outside level, it’s the same type of collections-based operation. If there is a difference, it would be that in the relational world you would be going over collections of records. In the XML world you would be going over collections of objects that model the infosets for XML. It’s a collection of elements, a collection of attributes, and those kinds of things. In essence, you are performing operations over collections of things.

TC I would like to step sideways for a moment and talk about the Entity Framework. A lot of systems are built where the developer issues an SQL query, brings some results into a data table or dataset, and binds the result of that into UI components. What’s different about the Entity Framework? Why should developers care about it? JB A very common problem that application developers face is what has been called impedance mismatch. Usually the data model used by the application side of the problem is richer and more directly associated with the business problem at hand. The vocabulary is very closely related to the needs of the business—your customers, your line items, your products, and your inventory—whereas in the database you’re dealing with similar concepts, but the artifacts of database normalization and schema design get in the way of what the applications really need.

All of a sudden, database concerns start being inserted. These are foreign to the application, so every single application has to have a layer that bridges the gap between the world of data and the world of the program. Tons of application developers see that mapping, if you will, as part of the application.

References:

mailto:feedback@acmqueue.com

Archives