Writing your very own LINQ-to-SQL engine
The code accompanying this article can be found here: C# to SQL provider sample download.
For other parts in the series, please refer to the index below:
Background
Two years after having finished the project, I finally decided to publish a series of posts on creating a C# to SQL query provider from scratch. Many years ago, I conceptualized some sort of a translator from .NET code into SQL to make data querying a more controlled, structured experience. That was back in .NET 2.0 days. Then along came LINQ to SQL, which wasn't very optimized in its first iterations and tended to produce unwieldy and overblown SQL queries. The performance was also unremarkable because it wasn't compiled, but interpreted. That is, whenever a LINQ statement was executed, the expression visitor kicked in every time, generating command text. And believe me, that is a very slow process, as in many cases SQL Server would execute the end result faster than it would take to generate the query! It got improved over time, with an option to create compiled statements being added in later framework versions.
Expressions namespace
When LINQ-to-SQL was introduced into the framework, along with it, arguably, one of the best features since .NET 2.0 was added – the expressions namespace. I've used it extensively for many years and numerous projects, and have the boldness to consider myself an expert in it. It allows to express programming logic in a series of expressions, forming a generic template that can be applied to a combination of inputs. There are two main areas where .NET expressions are very useful. One involves dynamic compilation scenarios where IL emit is typically used. The other, which made LINQ-to-SQL possible, is expression-typed lambda statements. That is, when the assignment target of a lambda is defined as an expression then the compiler will create a new instance of the latter. In other words, lambdas are syntactic sugar not only for delegates, but also expression objects. This is a brilliant design idea on behalf of the team behind .NET. This makes it possible not only for them, but for developers to write all kinds of extensions that integrate naturally and seamlessly into the language syntax, abstracting the magic away from the consumer.
The power of .NET expression trees
I used expressions to write high-performance components, such as serializers, formatters. Typically - any .NET reflection-based logic that needs all the benefits of reflection without the performance overhead imposed by it. An expression tree can be compiled to a method, making its execution just as fast as statically compiled code.
However, as mentioned previously, there's another useful feature of .NET expressions – seamless integration into the core language through expression-typed lambda methods. That's where the difference between LINQ-to-objects and LINQ-to-SQL lies. The extension methods of the former all take regular lambda delegate types as arguments. But! The data LINQ methods all take expression-typed lambda delegate arguments. Check for yourself!
The whys
So why re-invent the wheel? There are multiple reasons why I chose to write my own integrated query parser as opposed to using LINQ-to-SQL:
Getting started with your own query provider
I will not sugar-coat this. Writing a functional, practical and production-usable query provider is not an exercise for the faint of heart. For me, most of the work was stretched over 3 months of writing code whenever I could find the time outside of my main job. Fortunately, it wasn't the busiest time of my life, but the degree of dedication required was fairly high. So, overall, I'd probably allow at least 100 hours for this exercise and scale it up or down, based on complexity and level of experience. The actual programming wasn't the most time-consuming part of it. Probably, only about 20%. The rest was thinking, white-board time and brainstorming.
The components of an integrated language query provider
So, just like in any engine, there are several logical components that work together to produce the output:
That, in essence, is it! For educational purposes, let's make things simpler and only focus on one method in this series – “where”. The full implementation, which is included with the sample application, includes more methods, which, in my experience, are enough for most queries. Any complex scenarios that extend beyond the supported capabilities can usually be covered by views and other means. Let's get started!
Language integrated query extension methods
To be in line with LINQ-to-objects, the SQL implementation also relies on extension methods to build up the expression tree, although, there appears to be no reason other than compatibility for not using instance methods. We'll follow the same pattern and define extension methods representing query operations. Or, rather, one operation – the where clause filtering. Joins, ordering and other constructs are beyond of the scope of this article, but are available in the full implementation.
The complete code of the class containing the extension methods can be seen below. The explanations will follow.
public static partial class SqlQueryProvider { private const int mMaxTablesInResultSet = 3; private static MethodInfo[] mWhereMethodInfo = new MethodInfo[mMaxTablesInResultSet]; public static DataModelQueryCommand<List<T>> GetCommand<T>(this IDataModelQuery<T> query) where T : new() { ResultSetCommand<T> objCommand = ResultSetCommand<T>.Create(GetQuery<T>(query)); return objCommand; } //end method public static DataModelQueryCommand<List<DbQueryResultItem<T, T2>>> GetCommand<T, T2>(this IDataModelQuery<T, T2> query) where T : new() where T2 : new() { ResultSetCommand<DbQueryResultItem<T, T2>> objCommand = ResultSetCommand<DbQueryResultItem<T, T2>>.Create(GetQuery<T>(query)); return objCommand; } //end method public static DataModelQueryCommand<List<DbQueryResultItem<T, T2, T3>>> GetCommand<T, T2, T3>(this IDataModelQuery<T, T2, T3> query) where T : new() where T2 : new() where T3 : new() { ResultSetCommand<DbQueryResultItem<T, T2, T3>> objCommand = ResultSetCommand<DbQueryResultItem<T, T2, T3>>.Create(GetQuery<T>(query)); return objCommand; } //end method public static IDataModelQuery<T> Where<T>(this IDataModelQuery<T> query, Expression<Func<T, bool>> predicate) { if (mWhereMethodInfo[0] == null) mWhereMethodInfo[0] = (MethodInfo)MethodBase.GetCurrentMethod(); MethodInfo objMethod = mWhereMethodInfo[0].MakeGenericMethod(new Type[] { typeof(T) }); return GetQueryForMethodCall<T>(query, objMethod, predicate); }//end method public static IDataModelQuery<T, T2> Where<T, T2>(this IDataModelQuery<T, T2> query, Expression<Func<DbQueryResultItem<T, T2>, bool>> predicate) { if (mWhereMethodInfo[1] == null) mWhereMethodInfo[1] = (MethodInfo)MethodBase.GetCurrentMethod(); MethodInfo objMethod = mWhereMethodInfo[1].MakeGenericMethod(new Type[] { typeof(T), typeof(T2) }); return (IDataModelQuery<T, T2>)GetQueryForMethodCall<T>(query, objMethod, predicate); }//end method public static IDataModelQuery<T, T2, T3> Where<T, T2, T3>(this IDataModelQuery<T, T2, T3> query, Expression<Func<DbQueryResultItem<T, T2, T3>, bool>> predicate) { if (mWhereMethodInfo[2] == null) mWhereMethodInfo[2] = (MethodInfo)MethodBase.GetCurrentMethod(); MethodInfo objMethod = mWhereMethodInfo[2].MakeGenericMethod(new Type[] { typeof(T), typeof(T2), typeof(T3) }); return (IDataModelQuery<T, T2, T3>)GetQueryForMethodCall<T>(query, objMethod, predicate); }//end method private static DataModelQueryExt<T> GetQueryForMethodCall<T>(IDataModelQuery query, MethodInfo method, Expression argumentValue) { DataModelQueryExt<T> objQuery = GetQuery<T>(query); if (argumentValue == null) objQuery.mExpressionTree = Expression.Call(method, objQuery.mExpressionTree); else objQuery.mExpressionTree = Expression.Call(method, objQuery.mExpressionTree, argumentValue); return objQuery; }//end method private static DataModelQueryExt<T> GetQuery<T>(IDataModelQuery query) { //Must be immutable if (query is DataModelQueryExt<T>) return ((DataModelQueryExt<T>)query).Copy(); if (query is IDataModelQueryExt) return new DataModelQueryExt<T>(((IDataModelQueryExt)query).ExpressionTree, query.QueryName); return new DataModelQueryExt<T>(Expression.Constant(query), query.QueryName); }//end method }
Essentially, there are two generic public methods with three overloads, for result-sets containing one, two or three tables. Although, as the join clause isn't covered here, only one overload will be used. There are also a couple of private methods that either create the query container object or wrap it in each subsequent call expression. All extension methods are called on an object of type IDataModelQuery, which is the container for the expression tree. When the “where” method is called on the object, we need to determine if the object already contains an expression tree, which will be wrapped in an Expression.Call expression as the instance on which the method is called, with the predicate as the first argument. If there is no expression tree yet, it needs to be initiated by encapsulating the query object inside an Expression.Constant. Why? Because the object representing the query is a generic type with the generic argument being the type of a result-set item. Later, in the expression visitor, we'll need to get the type of the data item in order to access information about its fields/columns, as well as which database table it represents.
The complete source code for C# to SQL integrated query provider can be downloaded from here.
In the next section, we'll examine how the provider parses the expression tree chained together from the extension methods. Click the link below to continue to the next section:
C# to SQL provider - Parsing the expression tree in expression visitor.