How to write a LINQ to SQL provider in C# and .NET 4.0+ (part 4 of 4) - compiling expression trees

Compiling query-specific plumbing methods using expression trees

In the previous post I described how the where clause is generated, including all the parameters referenced within it. In this final part, I'll explain what happens once the SQL query text is constructed. What follows next is:

The provider dynamically generates two compiled delegate methods that perform the following, correspondingly:
- Create a list of SQL parameters for the query and resolve their runtime values from the expression tree.
- Read values from the resulting dataset, cast to appropriate types where necessary and assign to the corresponding data object members.

If you haven't read the previous articles in this series on creating a C# to SQL provider (parser), then you'll be able to find those materials here:

Part 1: Introduction
Part 2: Expression Visitor
Part 3: Where Clause Visitor
Part 4: Compiling Expression Trees (this)

The code accompanying this article can be found here:

C# to SQL provider sample download

If you have a look inside the DataModelQueryCommandExt class, you will see that inside one of the Create methods resides the logic that handles:

Instantiation of the expression visitor to let it parse the expression tree assembled from the C# query method calls.
Making use of the SQL text, column-to-member mappings and parameter collection generated by the visitor to dynamically compile two delegate functions with the following purposes:
- One makes a list of SqlParameter objects from the expression tree at runtime, by extracting the values from the appropriate tree nodes and assigning to the corresponding parameter objects.
- The second one uses column-to-member mappings to read and cast values from the resultset, creating new data objects for each row.

The expression visitor was covered in previous sections. Let's take a closer look at how the functions are compiled. Essentially, it is done using the expression namespace again, as it provides a convenient way of having code write code. So, for example, in general, for each query, there will need to be returned a collection of objects of type X, with its N members populated from their corresponding columns with type casts where necessary. There is more than one way to write an algorithm that would take in a data reader, type and member information, then iterate over the records and perform the necessary operations. Since type and member definitions don't change for each query execution, with the only variable being the data reader, the most efficient way to tackle this is to have the algorithm to automatically write code in exactly the same way as a developer would. Those instructions are then compiled into a delegate function, cached and called every time a new data reader is created.

Dynamic delegate to extract parameter values from the expression tree

The method that does that is GenerateGetSqlParametersFunc. It is called to construct code instructions, which are then compiled to a delegate function accepting an expression tree as its argument and spitting out a list of SqlParameter objects populated with run time variable values extracted from the tree. How does it work?

First thing to notice there is that there are a couple of reflection calls to get the constructor and the value property information for the SqlCommand .NET type. These are used together with Expression.New and Expression.Bind instructions.
The above instructions need to be repeated for every parameter in the collection. To remind where the collection comes from - it's populated by the visitor/parser from variables and constants referenced in the language query.
Each query parameter mapping specifies its CLR type that has to be converted to database type expression passed to the constructor expression as one of the arguments.
The name of the parameter is denoted as a constant expression, and it also passed to the constructor.
The expression that represents the value is bound to the parameter value property via the bind expression.
Finally, the member init expression (C# equivalent of new SqlParameter(){Value = X}) combines together the constructor and the property bind expressions.
The init expression is added to the collection to be later used with the list init expression (equivalent of new List(){a, b, c}).
Finally the List initializer expression is wrapped inside a lambda and compiled to a delegate.

Dynamic delegate to populate data object members from DataReader

This one is a bit trickier, but nothing extremely fancy:

There is one argument in this function defined as Expression.Parameter.
There is a local temp variable to hold a value extracted from the DataReader until converted and assigned to the corresponding member.
We get a reference (using reflection) to the GetValue method of IDataReader interface that will be called for the value.
As the entire resultset can be just a single value (e.g. from a count() query), first thing that needs to be checked is whether that's the case. In this simple scenario, a primitive value is either bool or int (for methods like Exists and Count). Just converting it to the right type and returning would suffice. So the cast would be the only expression in the block wrapped in lambda and compiled.
With composite return types (data objects), things get more complicated. Firstly, the dataset may have more than one table (not in our case). As only one object can be returned from a call, a container needs to be instantiated to hold each data item. So if there's only one table being selected, then just the table object will be returned. However, for two or three, we'd need a container. So, that's more things to instantiate.
Next all the table mappings provided by the visitor are iterated. Reflection is used to get a constructor for each one.
All the member mappings on each table are iterated in the same order as the corresponding column names appear in the select clause. Therefore the values can just be extracted by the ordinal (index). To define the DataReader.GetValue method call with the index as the argument, Expression.Call and Expression.Constant are used.
A property mapped to a database column needs to have a setter, otherwise a value cannot be assigned to it. If it doesn't, skip.
Next the value needs to be converted to the right type (if different). How that is achieved is determined using several heuristics:
- Does the property have an attribute on it denoting its database type, from which the CLR type should be inferred? If it does, is it different from the member type?
- Whether the member is an enum. In this case, if the database type is not an integer (from which an enum can be instantiated), attempt to convert to int first.
- Can a direct cast be performed? If it cannot, then does the value support the IConvertible interface? Does the latter have a method to change to the target type? If yes, then call it and assign.
- Is the target property a value type? Can it be nullable?
- If the source value is DbNull, then set the property to null for a reference type, otherwise - to the default value.
The get value expression is wrapped inside a conversion one and is bound to the target property using Expression.Bind
All the bindings are appended to a collection and the table object is later instantiated via Expression.New together with Expression.MemberInit
Finally, if there's more than one table object, a container is constructed with the item init expressions as arguments. The result is compiled.