The previous part demonstrated what are the SQL translations of the LINQ to Entities queries. This part discusses how the translation is implemented. Regarding different database systems can have different query languages or different query APIs, EF/Core implement a provider model to work with different kinds of databases. In EF Core, the base libraries are the Microsoft.EntityFrameworkCore and Microsoft.EntityFrameworkCore.Relational NuGet packages. Microsoft.EntityFrameworkCore provides the database provider contracts as Microsoft.EntityFrameworkCore.Storage.IDatabaseProviderServices interface. And the SQL database support is implemented by the Microsoft.EntityFrameworkCore,SqlServer NuGet package, which provides Microsoft.EntityFrameworkCore.Storage.Internal.SqlServerDatabaseProviderServices type to implement IDatabaseProviderServices. There are other libraries for different databases, like Microsoft.EntityFrameworkCore.SQLite NuGet package for SQLite, etc.
In EF, the EntityFramework NuGet package contains 2 assemblies, EntityFramework.dll and EntityFramework.SqlServer.dll. The base library EntityFramework.dll provides database provider contracts as System.Data.Entity.Core.Common.DbProviderServices abstract class, and the SQL database provider library EntityFramework.SqlServer.dll provides System.Data.Entity.SqlServer.SqlProviderServices to implement DbProviderServices
With this provider model, EF/Core breaks the translation into 2 parts. First, IQueryable query methods work with expression trees, and EF/Core base libraries translate these .NET expression tree to generic, intermediate database expression tree; Then the specific EF/Core database provider is responsible to generate query language for the specific database.
Before translation, .NET expression tree must be built to represent the query logic. As fore mentioned, expression tree enables function as data. In C#, an expression tree shares the same syntax as functions, but is compiled to abstract syntactic tree representing function’s source code. In LINQ, IQueryable utilizes expression tree to represent the abstract syntactic structure of a remote query.
// IEnumerator<T> GetEnumerator(); from IEnumerable<T>.
6
7
// Type ElementType { get; } from IQueryable.
8
9
// Expression Expression { get; } from IQueryable.
10
11
// IQueryProvider Provider { get; } from IQueryable.
12
}
13
}
It is a wrapper of iterator factory, an element type, an expression tree representing the current query’s logic, and a query provider of IQueryProvider type:
IQueryProvider has CreateQuery and Execute methods, all accepting a expression tree parameter. CreateQuery methods return an IQueryable query, and Execute methods return a query result. These methods are called inside the Queryable methods.
As fore mentioned, Queryable also provides 2 kinds of query methods, sequence queries returning IQueryable query, and value queries returning a query result. Take Where, Select, and First as examples, the following are their implementations:
They just build a MethodCallExpression expression, representing the current query method is called. Then they obtain query provider from source’s Provider property. The sequence query methods call query provider’s CreateQuery method to return IQueryable query, and the value query methods call query provider’s Execute method to return a query result. All Queryable methods are implemented in this pattern, except AsQueryable, which is discussed in the previous part.
The above example filters the products with Name longer than 10 characters, and queries the products’ Names. By desugaring the lambda expressions, and unwrapping the query methods, the above LINQ to Entities query is equivalent to:
using (IEnumerator<string> iterator= selectQueryable.GetEnumerator()) // Execute query.
45
{
46
while (iterator.MoveNext())
47
{
48
iterator.Current.WriteLine();
49
}
50
}
51
}
Here are the steps how the fluent query builds its query expression tree:
Build data source:
The initial source IQueryable is a DbSet instance automatically created by EF/Core. It wraps:
A ConstantExpression expression representing the data source.
A query provider that implements IQueryProvider. In EF Core it is an automatically created EntityQueryProvider instance, and in EF it is DbQueryProvider.
Build Where query:
A predicate expression is built for Where,
Where accepts the IQueryable source. But actually Where only needs the source’s expression and query provider. A MethodCallExpression expression is built to represent a call of Where itself with 2 arguments, the source and the predicate expression. Then source query provider’s CreateQuery method is called with the MethodCallExpression expression just built, and return an IQueryable query, which wraps:
The MethodCallExpression expression representing current Where call
A query provider, which is the same one from the source.
Build Select query:
A selector expression is built for Select
Select accepts the IQueryable returned by Where as source. Again, Select only needs the expression and query provider from source. A MethodCallExpression expression is built to represent a call to Select itself with 2 arguments, the source and the selector expression. Then source query provider’s CreateQuery method is called with the MethodCallExpression expression just built, and return an IQueryable query, which wraps:
The MethodCallExpression expression representing current Select call
A query provider, which is the same one from the source.
So, the final IQueryable query’s Expression property is the final abstract syntactic tree, which represents the entire LINQ to Entities query logic:
1
MethodCallExpression (NodeType = Call, Type = IQueryable<string>)
2
|_Method = Queryable.Select<Product, string>
3
|_Object =null
4
|_Arguments
5
|_MethodCallExpression (NodeType = Call, Type = IQueryable<Product>)
6
||_Method = Queryable.Where<Product>
7
||_Object =null
8
||_Arguments
9
||_ConstantExpression (NodeType = Constant, Type = IQueryable<Product>)
||_UnaryExpression (NodeType = Quote, Type = Expression<Func<Product, bool>>)
12
||_Operand
13
||_Expression<Func<Product, bool>> (NodeType = Lambda, Type = Func<Product, bool>)
14
||_Parameters
15
|||_ParameterExpression (NodeType = Parameter, Type = Product)
16
|||_Name ="product"
17
||_Body
18
||_BinaryExpression (NodeType = GreaterThan, Type =bool)
19
||_Left
20
|||_MemberExpression (NodeType = MemberAccess, Type =int)
21
|||_Member ="Length"
22
|||_Expression
23
|||_MemberExpression (NodeType = MemberAccess, Type =string)
24
|||_Member ="Name"
25
|||_Expression
26
|||_ParameterExpression (NodeType = Parameter, Type = Product)
27
|||_Name ="product"
28
||_Right
29
||_ConstantExpression (NodeType = Constant, Type =int)
30
||_Value =10
31
|_UnaryExpression (NodeType = Quote, Type = Expression<Func<Product, string>>)
32
|_Operand
33
|_Expression<Func<Product, string>> (NodeType = Lambda, Type = Func<Product, string>)
34
|_Parameters
35
||_ParameterExpression (NodeType = Parameter, Type = Product)
36
||_Name ="product"
37
|_Body
38
|_MemberExpression (NodeType = MemberAccess, Type =string)
39
|_Member ="Name"
40
|_Expression
41
|_ParameterExpression (NodeType = Parameter, Type = Product)
42
|_Name ="product"
In FE, the difference is, the original IQueryable data source wraps a MethodCallExpression expression, which represents an ObjectQuery instance’s MergeAs instance method call with 1 argument, the MergeOption.AppendOnly enumeration. It means append new entities to the entity cache if any entity is constructed by the query. Entity cache will be discussed in a later part.
This also demonstrates that lambda expression, extension methods, and LINQ query expression are powerful language features of C#. Such a rich abstract syntactic tree can be built by C# code as simple as:
Here the initial source and and Select query are the same as the previous example. So this time, just unwrap the First method. The above First query is equivalent to:
In First query, the MethodCallExpression expression is built in the same way to represent current First call. The difference is, query provider’s Execute method is called instead of CreateQuery, so that a query result is returned instead of a query.
Similarly, the last expression tree built inside First, is the final abstract syntactic tree, which represents the entire LINQ to Entities query logic:
1
MethodCallExpression (NodeType = Call, Type =string)
2
|_Method = Queryable.First<string>
3
|_Object =null
4
|_Arguments
5
|_MethodCallExpression (NodeType = Call, Type = IQueryable<string>)
6
|_Method = Queryable.Select<Product, string>
7
|_Object =null
8
|_Arguments
9
|_ConstantExpression (NodeType = Constant, Type = IQueryable<Product>)
When LINQ to Entities queries are executed by either pulling values from IQueryable, or calling IQueryProvider.Execute, EF/Core compiles .NET expression tree to database expression tree.
The logic of LINQ to Entities can be represented by .NET expression tree, and EF/Core also use expression tree to represent the database query logic. For example, EF Core base libraries provides the Microsoft.EntityFrameworkCore.Query.Expressions.SelectExpression represents a database SELECT query:
DbQueryCommandTree’s Parameters property contains the parameters for the database query, and Query property is the top node of the DbExpression tree. They are similar to LambdaExpression’s Parameters and Body properties.
EF Core calls the third party library Remotion.Linq to compile LINQ expression tree to a query model, then EF Core compiles the query model to database expression tree, which is a SelectExpression instance. The following Compile method demonstrates how the compilation can be done. It accepts a LINQ expression tree, and returns a tuple of SelectExpression and its parameters, if any:
EF Core first calls Remotion.Linq library to compile LINQ query method call nodes to QueryModel. Under Remotion.Linq.Parsing.Structure.IntermediateModel namespace, Remotion.Linq provides IExpressionNode interface, and many types implementing that interface, where each type can process a certain kind of query method call, for example:
MethodCallExpression node representing Queryable.Where call is processed by WhereExpressionNode, and converted to Remotion.Linq.Clauses.WhereClause, which is a part of QueryModel
MethodCallExpression node representing Queryable.Select call is processed by SelectExpressionNode, and converted to Remotion.Linq.Clauses.SelectClause, which is a part of QueryModel
MethodCallExpression node representing Queryable.First or Queryable.FirstOrDefault call is processed by FirstExpressionNode, and converted to Remotion.Linq.Clauses.ResultOperators.FirstResultOperator, which is a part of QueryModel
etc. Then EF Core continues to compile QueryModel to SelectExpression. For example:
WhereClause is converted to predicate child nodes of the SelectExpression
SelectClause is converted to projection child nodes of the SelectExpression
FirstResultOperator is converted to limit child node of the SelectExpression
etc.
In EF, the fore mentioned ExpressionConverter is a huge type. It has tons of nested translator types for all supported expression tree nodes. For example
WhereTranslator compiles Queryable.Where node to FilterDbExpression node
SelectTranslator compiles Queryable.Select node to ProjectDbExpression node
FirstTranslator compiles Queryable.First or Queryable.FirstOrDefault to LimitDbExpression node
The above Where query’s predicate has a logic to call string.Length and compare the result to a constant. EF Core provides translator types under Microsoft.EntityFrameworkCore.Query.ExpressionTranslators.Internal namespace to translate these .NET API calls. Here MemberExpression node representing string.Length call is processed by SqlServerStringLengthTranslator, and converted to a SqlFunctionExpression node representing SQL database function LEN call:
There are many other translators to cover other basic .NET APIs of System.String, System.Enum, System.DateTime, System.Guid, System.Math, for example:
MethodCallExpression node representing string.Contains call (e.g. product.Name.Contains(“M”)) is processed by SqlServerContainsOptimizedTranslator, and converted to a BinaryExpression node representing SQL database int comparison, where the left child node is a SqlFunctionExpression node representing SQL database function CHARINDEX call, and the right child node is a ConstantExpression node representing 0 (e.g. CHARINDEX(N’M’, product.Name) > 0)
MethodCallExpression node representing Math.Ceiling call is processed by SqlServerMathCeilingTranslator, and converted to SqlFunctionExpression node representing SQL database function CEILING call
MemberExpression node representing DateTime.Now or DateTime.UtcNow property access, is processed by SqlServerDateTimeNowTranslator, and converted to SqlFunctionExpression node representing SQL database function GETDATE or GETUTCDATE call
etc.
There are also a few other APIs covered with other EF Core components. For example, In Remotion.Linq, MethodCallExpression node representing Enumerable.Contains or List.Contains call is converted to to Remotion.Linq.Clauses.ResultOperators.ContainsResultOperator. Then in EF Core, ContainsResultOperator is processed by Microsoft.EntityFrameworkCore.Query.ExpressionVisitors.SqlTranslatingExpressionVisitor. and converted to InExpression node representing SQL database IN operation.
As fore mentioned, EF provides nested translator types inside ExpressionConverter.There are also many other translators covering .NET APIs of System.String, Microsoft.VisualBasic.Strings, System.Decimal, System.Enum, System.DateTime, System.DateTimeOffset, Microsoft.VisualBasic.DateAndTime, System.Math, System.Guid, System.Nullable, System.Data.Spatial.DbGeography, System.Data.Spatial.DbGeometry, etc.For example,
MethodCallExpression node representing string.Contains call (e.g. product.Name.Contains(“M”)) is processed by StringContainsTranslator, and converted to a DbLikeExpression node representing SQL database LIKE operation (e.g. product.Name LIKE N’%M%’).
MethodCallExpression node representing Math.Ceiling call is processed by CanonicalFunctionDefaultTranslator, and converted to DbFunctionExpression node representing SQL database function CEILING call
MemberExpression node representing DateTime.Now or DateTime.UtcNow property access, is processed by SqlServerDateTimeNowTranslator, and converted to DbFunctionExpression node representing SQL database function SYSDATETIME or SYSUTCDATETIME call
Similar to EF Core, in EF MethodCallExpression node representing Enumerable.Contains or List.Contains call is not processed by translators, but by System.Data.Entity.Core.Objects.ELinq.LinqExpressionNormalizer.
Apparently EF/Core can only compile the supported .NET API calls, like the above string.Length call. It cannot compile arbitrary API calls. The following example wraps the string.Length call and result comparison with constant into a custom predicate:
At compile time, the predicate expression tree has a MethodCallExpression node representing FilterName call, which apparently cannot be compiled to SQL by EF/Core. In this case, EF Core execute FilterName locally.
When EF fails to compile query, it throws exception. So in EF, the above example throws NotSupportedException: LINQ to Entities does not recognize the method ‘Boolean FilterName(System.String)’ method, and this method cannot be translated into a store expression. To make it work, the Where query has to be manually specified as local LINQ to Objects query:
Not all database APIs has .NET built-in APIs to translated from, for example, there is no mapping .NET API for SQL database DATEDIFF function. EF provides mapping methods to address these scenarios. As fore mentioned, EF implements a provider model, and these mapping methods are provides in 2 levels too:
In EntityFramework.dll, System.Data.Entity.DbFunctions provides mapping methods supported by all database providers, like DbFunctions.Reverse to reverse a string, DbFunction.AsUnicode to ensure a string is treated as Unicode, etc. These common database functions are also called canonical functions.
In EntityFramework.SqlServer.dll, System.Data.Entity.SqlServer.SqlFunctions provides mapping methods from SQL database functions, like SqlFunctions.Checksum method for CHECKSUM function, SqlFunctions.CurrentUser for CURRENT_USER function, etc.
The following LINQ to Entities query calculates the number of days between current time and photo’s last modified time. In the following LINQ to Entities query expression tree, the MethodCallExpression node representing DbFunctions.DiffDays call can be compiled by EF, and is converted to a DbFunctionExpression node representing canonical function Edm.DiffDays call:
// [Extent1].[LargePhotoFileName] AS [LargePhotoFileName],
13
//DATEDIFF (day, [Extent1].[ModifiedDate], SysUtcDateTime()) AS [C2]
14
//FROM [Production].[ProductPhoto] AS [Extent1]
15
}
The following example filters the product’s names with a pattern. The SqlFunction.PatIndex call is compiled by EF, and converted to SQL database function SqlServer.PATINDEX call:
The SQL database provider of EF/Core provides a SQL generator to traverse the compiled database query abstract syntactic tree, and generate SQL database specific remote SQL query. EF Core provides SQL generator as Microsoft.EntityFrameworkCore.Query.Sql.IQuerySqlGenerator interface:
It is implemented by Microsoft.EntityFrameworkCore.Query.Sql.Internal.SqlServerQuerySqlGenerator. SQL generator wraps a database expression tree inside, and provides a GenerateSql method, which returns Microsoft.EntityFrameworkCore.Storage.IRelationalCommand to represents generated SQL:
Inside the last DbCommandDefinition.CreateCommand call, a SqlGenerator instance is constructed with SQL database’s version (detected with ADO.NET API SqlConnection.ServerVersion), and its GenerateSql method is called to generate SQL, then the generated SQL and parameters (provided by DbQueryCommandTree.Parameters) are wrapped into a DbCommand instance.
The above WhereAndSelectDatabaseExpressions and SelectAndFirstDatabaseExpressions method builds database expression trees from scratch. Take them as an example to generate SQL:
SQL generator traverses the command tree nodes, a specific Visit overloads is called for each supported node type. It generates SELECT clause from DbProjectionExpression node, FROM clause from DbScanExpression node, WHERE clause from DbFilterExpression node, LIKE operator from DbLikeExpression, etc.
SQL generator generates TOP expression from DbLimitExpression node, which is an example where SQL database’s version matters. Inside the SqlGenerator.Visit overload for DbLimitExpression, TOP 1 is generated for SQL Server 2000 (8.0), and TOP(1) is generated for later version.
So finally LINQ to Entities queries are translated to remote SQL database queries. The next part discusses the query execution and data loading.
Entity Framework/Core and LINQ to Entities (5) Query Translation Implementation