Custom LINQ-to-SQL provider: part 2, the expression visitor
This is the second post in the series of tutorials on how to make a C# to SQL integrated query provider. If you missed the introduction, it can be found in the index below:
The code accompanying this article can be found here:
C# to SQL provider sample download.
Parsing the expression tree – expression visitor
Now the heart and soul of the language integrated query provider. This section covers the process of deconstructing the expression tree, transforming it and translating it into an actual query text. This is all handled by a class called expression visitor. Its design is based on the visitor pattern. In the previous section the formation of the expression tree was explained. The resulting structure is then passed into the visitor, which needs to make sense out of it.
It's worth noting that the code that accompanies this article was written from ground up and doesn't contain any derivatives from other similar works, including the Microsoft LINQ.
Deconstructing the expression visitor - the select and from clauses
Throughout this series of articles, we'll be parsing the following seemingly simple statement:
(new DataModelQuery<FileModel>("TestQuery")) .Where(file => file.DirectoryId == test.Item1).GetCommand();
It can be found in the sample project attached to this article, along with the rest of the source code used to parse it.
So, what happens when the GetCommand method is executed? This is an extension method on IDataModelQuery object that contains the expression tree for the Where method. It calls into the Create method of DataModelQueryCommandExt private class that belongs to the provider. If this expression was already parsed before, the command should already be compiled and cached, so it can be returned right away. Otherwise, a new DbExpressionVisitor object is created.
The entry point into the visitor is coincidentally the non-specific “visit” method, which contains a big switch statement branching the execution to the appropriate function based on the type of expression passed to it. So, for example, if the type of expression is “call”, then it is cast into the CallExpression object, which would contain information about the actual real query method called from the consuming code. In this case, it's the “where” method, as this article doesn't cover any other for the sake of simplicity. The strongly typed expression is then passed to the appropriate function that determines the name of the method and its arguments. All the arguments, including the instance to which the method is applied need to be visited as well. The instance (first argument) is visited because in SQL query terms it's the source from which the select clause is generated. It can be a straight table select or a subquery. The second argument needs to be parsed because it's the lambda predicate, from which we need to generate the where clause text. What happens from here, is summarized in the diagram below:
First a StringBuilder instance is created to write the SQL query text. Then the select clause is written to it by analysing the leftmost argument - the DataModelQuery<FileModel> instance. FileModel type is the source table that appears in the from clause. Its properties marked with the DataColumn attribute are the columns that appear in the select clause. The names are derived from the attribute, where specified, or, otherwise - the member name itself.
The predicate, which is the next parameter (at index 1), needs to be visited as well to prepare it for the WhereClauseExpressionVisitor. A number of transformations happen here. First, the file argument introduces the table into the scope (note that a new BaseExpressionVisitor is created for each scope). The parameter is added to a dictionary that belongs to the new visitor.
Next the body of the lambda is parsed (file.DirectoryId == test.Item1), which is essentially a BinaryExpression with parts separated by the == operator. Both sides of it are visited. Each is a MemberExpression, but there are differences. The left one refers to the select list, while that on the right references something outside of the SQL query - a local variable in the C# code. Therefore the two are handled differently:
There is nothing much else that the base visitor needs to do with the lambda, and so it is returned to the caller (the visit method for a CallExpression of the original visitor) in the form:
file => ScopedTablesExpression.DirectoryId == RunTimeValueExpression.Item1
The transformed predicate expression is then fed into the WhereClauseVisitor (described in the next section) to produce the final piece of the puzzle - the where clause, along with the query parameters. All of that is encapsulated within a custom SqlExpression and returned to the command-building code that uses the select clause mappings to generate DataReader-to-model bindings, and parameter information to compile a set of instructions for supplying query values at runtime.
In general, this is how things work. For more information on the WhereClauseExpressionVisitor and generation of compiled parameter and data reader bindings to populate the SqlCommand object and extract values from the resultset, please see the following sections:
Abstraction of an abstraction - describing value access in an expression tree using expressions
As discussed earlier, it may be of interest to explain in more detail how remapping of variables referenced in the tree happens when encountered during parsing by the base visitor. So, in essence, certain branches are replaced by expression chains describing how to traverse from the root to the leaf (e.g. a ConstantExpression) to get its value.
Each visit method receives an object of type ParentObjectInfo as the last argument. This object, like a stack trace, describes all the expressions unwrapped to get to the one being currently visited. Every time that another visit call is made, the argument is constructed by encapsulating the previous such object, as well as the property name of the parent expression from which the child was derived. Why is this needed? This builds a trace of expressions in the hierarchy beginning at the root expression and leading up to the expression being evaluated in a given visit method. The RemapToCommonParameter method relies on this information to generate an expression chain that describes how to extract a runtime value out of the compiler-generated expression tree. This is a fairly abstract concept, an abstraction of an abstraction. When the compiler generates an expression based on the lambda syntax, it would usually contain a number of sub-expressions, each with their own children. So, a lambda expression (e.g. item => item == localVar.value) would have an argument expression and a body expression. The latter, in its turn, could be broken down into a binary expression (e.g. item == localVar.value), which is a composite of a parameter expression on the left and a member access expression on the right. Now, the expression tree structure will always remain the same on each run through the code. What may change is the values, object instances, etc. If we reconcile that with the fact that the provider needs to generate and compile a method to extract values out of the expression tree and populate them into the database command parameters, then this is what we get - the necessity to construct and compile an expression tree to evaluate an expression tree at runtime. So, in other words: