How to write a LINQ to SQL provider in C# and .NET 4.0+ (part 3 of 4) - where clause visitor

Custom LINQ-to-SQL provider: part 3, the where clause

This is the third post in the series of tutorials on how to make a C# to SQL integrated query provider. If you missed the previous two, those can be found here:

Part 1: Introduction
Part 2: Expression Visitor
Part 3: Where Clause Visitor (this)
Part 4: Compiling Expression Trees

The code accompanying this article can be found here:

C# to SQL provider sample download.

To reflect on the earlier discussion, which can be found by following the links above, the aim is to write a C# to SQL provider (parser) that would translate a C# statement into a database query. Once it's completed, the following example that we chose to base it upon should just work, having been translated into a SQL query string, together with a couple of compiled methods to generate the necessary query parameters, execute and read results:

(new DataModelQuery<FileModel>("TestQuery")) 
.Where(file => file.DirectoryId == test.Item1).GetCommand();

To continue from where the previous post left off - once the necessary transformations have been done on the expression tree in the main visitor, which takes care of parsing the select and from clauses, there's still the where clause - the responsibility of the where clause visitor. This is the last step in the parsing process. No, or very little transformation to the tree itself happens here. By this stage, all the branches are expected to have been visited and replaced, where necessary, with sub-query expressions, parameter references and etc. So what exactly does the where clause expression visitor do? Its main focus is on binary and unary expressions. For example:

customer.FirstName == "John" && customer.JoinDate > someDate && (customer.isValued || customer.totalSpend > 10000)

In the above example, there are multiple binary expressions, for example:

customer.FirstName == "John" && customer.JoinDate > someDate

Which can be broken down into two branching binary expressions as follows.

Left part:

customer.FirstName == "John"

Right part:

customer.JoinDate > someDate

There's only one unary expression in this example:

customer.isValued

The unary, in turn, consists of a constant/variable on the left (customer) and member access on the right (isValued).

So the job of the where clause visitor is to write these same conditions using SQL syntax. But, there's more than that. There's nothing more annoying than excessive and redundant brackets around each binary statement in the output text. It's not very pretty and much less readable. So, that means that there needs to be some extra logic to determine where parentheses are necessary. The || condition in the example above must remain enclosed, because otherwise the logic will change, which, for obvious reasons, is undesirable, as it would lead to bugs and unexpected results.

Also, at this stage there's an opportunity to simplify the code and refactor unnecessary bits. For example, !(!(!customer.isValued)) would simply be resolved to NOT [customer].[isValued]. All the extra NOTs would be edited away.

As there can be some significant syntactical differences between C# and SQL, the order of operands may need to change. For example, while the following example is valid for C# and swapping would not change anything, in SQL there's a very specific way of laying it out:

null != customer.firstName

Needless to say, that with a null operand, the equality operator should be different - IS instead of =. And the null itself must appear on the right, otherwise, if I recall correctly, it's a syntactical error.

So, let's have a look at what the where clause expression visitor is doing. It subclasses another class - clause visitor, just for the sake of code reusability and keeping things tidy. But there's only one method there really in our tutorial, albeit, a very important one - GetColumnReference.

GetColumnReference - determining the full column qualifier to be used within a clause

If you recall from the previous section, the main visitor replaces each constant-typed expression representing a data object with a custom expression that contains table metadata - ScopedTablesExpression. So we get column references from MemberExpression objects being visited by the clause visitor. By matching the property name to one of the key-value pairs in the member-column collection contained in the ScopedTablesExpression, the corresponding column name can be retrieved.

One important thing here is the flag that indicates whether to prefix the column name with the table alias (think dbo.Customer t1, where t1 is the alias) or use the column alias instead. And it all depends on where in the query it is referenced. The example below should illustrate this:

select s1t1_FirstName 
from 
( 
select s1t1.FirstName as s1t1_FirstName 
from dbo.Customer s1t1 
where s1t1.FirstName is not null 
) t1 
where s1t1_FirstName = 'John'

As you can see in the subquery, FirstName is qualified with the table alias, but in the outer query, it is referenced by its own alias only, where s1t1 stands for subquery 1, table 1. This is a convenient way of assigning unique names when parsing. Readability is given little thought in this naming scheme, as error resilience is primary concern.

Next, we'll take a look at how the where clause visitor handles constants, members, operators, their precedence, null and boolean values.

Parsing the where clause - constants

When the clause visitor looks at a constant expression, it needs to determine whether to inline or add a parameter. The logic for that is simple - all the primitives, such as booleans and integers, are written directly to the builder. Other values (which are essentially strings) are passed in as parameters. ADO.NET already has resilient escaping logic in there, so it makes no sense to not use it. In this case a parameter reference is inserted in place of the value (e.g. @1) and appended to the collection, which is then used to populate the SQL command object. So:

customer.FirstName == "John"

would be written as

t1.FirstName = @1

Where t1 is the customer table alias and @1 is the parameter passed to the database command object.

Parsing the where clause - binary expressions

This is where it starts to get quite interesting. If you take a look at the method that visits binary expressions, you may notice a piece of code that inverts those of boolean type, if the corresponding flag is set to true. This happens when the class comes across cases like !!isTrue. As you can see the double operator NOT cancels itself out, so there's really no reason to persist it, as we may as well just leave isTrue without the extra fluff. Also, cases like !(a = b) become a != b, which is more readable.

Once the inversion is done, if any, the VisitBinary method takes over. A lot of stuff happens in this method:

Determining whether the boolean expression needs to be written in the form a = 1, given that "a" is equal to true in C#, as while a boolean variable in a conditional statement by itself is fine in C#, SQL doesn't like that, as the counterpart in that language is a bit, and the parser expects the equality to be in the form: where a = 1.
Figuring out whether to put brackets around the left part and if the right one also needs them. This is done by looking at what type of expression is on the left or on the right and comparing it to the one that joins them, keeping operator precedence in mind. So, if the middle (the one that joins the two sides) operator has a higher precedence than its left cousin, then brackets are required. So a subtraction on the left would need to be wrapped in parentheses if it's joined by multiplication to an expression on the right (which could be an addition). So, in other words, the following should be respected: (a - b) * (c + d). Removing the braces would change the meaning and the result of the formula. This is simply math rules, it's not so much a language thing.
Deciding if the left and the right operands need to be swapped. This would happen if there was a null on the left, e.g. null == a. To translate it to SQL syntax, null would have to appear on the other side: a IS NULL.

Once these criteria are established, the parts of the binary expression are written to the string builder, with brackets where necessary and an operator in the middle. The SQL representation of the latter is determined in a method conveniently called GetBinaryOperator.

Unary expression in the where clause visitor and the rest

There's only one use for examining an unary expression here - the operator not. Boolean inversion flag is set here. Then the flow continues on to the child expressions, contained within the "not" one. If the next node is of the same type, the inversion happens once more, otherwise it's either one the following: binary, constant, extension, member.

We discussed how binary and constant expressions are parsed earlier. An extension, in our case, could be a custom-defined DbNullExpression or a parameter expression created in the main visitor. In the first instance, all that needs to be done is writing "null" to the builder. In the second, just like with a constant, a parameter is written to the text output and added to the collection.

A member expression is translated into a column reference, as mentioned earlier, e.g. t1.FirstName = @1.

So, in the end the where clause of the main example would be translated as follows.

Where(file => file.DirectoryId == test.Item1);

SQL

where [t1].[FileDirectoryID] = @1

[t1] here is the reference to the table from which records are being selected. If you take a look at the complete text of the final result, it will make more sense:

select [t1].[FileId] as [Files_FileId], [t1].[IsDeleted] as [Files_IsDeleted], [t1].[FileExtension] as [Files_FileExtension], [t1].[FileName] as [Files_FileName], [t1].[FileTypeID] as [Files_FileTypeID], [t1].[FileDirectoryID] as [Files_FileDirectoryID] 
from Files t1 
where [t1].[FileDirectoryID] = @1

In the next and last section, I'll discuss the final stages of the process - compilation and execution. Please keep reading by following the link below to find out how methods are dynamically generated and called to populate the query with parameters, run it and read the resultset into objects:

How to write a LINQ to SQL provider in C# and .NET 4.0+ (part 4 of 4) - compiling expression trees