The previous parts has discussed some aspects that can impact the performance of EF/Core and LINQ to Entities, and here is a summary:
Remote LINQ to Entities query can have better performance than local or hybrid query. An intuitive example is Last query for a table data source, which could query the entire table, load data to local, and query the last result locally. It is better to just have a remote query and only load the specific result.
Using Select to only query the data can have better performance than querying full entity.
Disabling entity tracking can improve the performance.
Disabling automatic change detection can improve the performance.
When adding a sequence of entities to repository, DbSet.AddRange/DbSet.RemoveRange call can have better performance than many DbSet.Add/DbSet.Remove calls.
And, in EF, with lazy loading, accessing an entity’s navigation property can cause additional database query round trips (the N + 1 queries problem). Eager loading can improve the performance by read all needed data with 1 single database query.
This part continues the discussion of performance.
// ) AS [GroupBy1]',N'@p__linq__0 nvarchar(4000)',@p__linq__0=N'AdventureWorks'
22
//SELECT
23
// [GroupBy1].[A1] AS [C1]
24
//FROM ( SELECT
25
//COUNT(1) AS [A1]
26
//FROM [dbo].[__MigrationHistory] AS [Extent1]
27
// ) AS [GroupBy1]
28
//SELECTTOP (1)
29
// [Extent1].[Id] AS [Id],
30
// [Extent1].[ModelHash] AS [ModelHash]
31
//FROM [dbo].[EdmMetadata] AS [Extent1]
32
//ORDER BY [Extent1].[Id] DESC
33
//SELECT
34
// [Extent1].[ProductCategoryID] AS [ProductCategoryID],
35
// [Extent1].[Name] AS [Name]
36
//FROM [Production].[ProductCategory] AS [Extent1]
37
}
38
}
39
}
Executing above code, a bunch of SQL queries can be traced. And only the last SELECT query is the expected LINQ to Entities query translation. Actually, before a database’s first operation at runtime (e.g., querying Production.ProductCategory table here), EF does a lot of work to initialize its object-relational mapping:
Initialize provider manifest
Initialize the entity data model. EF automatically builds the object models (CLR models, not above entities), conceptual models, storage models, object-conceptual model mappings, conceptual-storage model mappings, etc..
Initialize the database, if needed.
Initialize mapping views, which are the mapping information for entity sets.
Initialize a dynamic assembly “EntityFrameworkDynamicProxies-{OriginalAssemblyName}, Version=1.0.0.0, Culture=neutral, PublicKeyToken=null”, and define proxy types in it.
The above initialization steps executes only once at runtime, and their performance can be improved from the default behavior.
As fore mentioned, EF implements the provider model to work with different kinds of data stores, and it need to get the basic information of current data store. For SQL database:
The SQL database server’s version is detected by calling DbConnection.ServerVersion
The engine edition is queried by above SERVERPROPERTY metadata function, to determine whether it is a on premise database (SQL Server) or cloud database (SQL Azure, aka Azure SQL Database).
For SQL database, the supported provider manifest tokens are:
For any on premise SQL database later than 11.0, just use “2012”. For cloud SQL database, use “2012.Azure”. In this tutorial, the server version and engine edition is known. So these information can be provided to EF via System.Data.Entity.Infrastructure.IManifestTokenResolver:
Then engine edition query is not executed during initialization. Notice EF only support defining a single type derived from DbConfiguration. In the object-relational mapping part, there us already a RetryConfiguration type defined to specify the retry strategy. The logic in both types must be merged intto one type, otherwise EF throws exception during initialization.
EF provides several built-in initializers under System.Data.Entity namespace:
NullDatabaseInitializer: Do nothing for initialization
DropCreateDatabaseAlways: Always drop the database and create again
DropCreateDatabaseIfModelChanges: Drop and create database when the code mapping mismatches database schema.
MigrateDatabaseToLatestVersion<TContext, TMigrationsConfiguration>: Use the specified code to update the database to the latest version.
CreateDatabaseIfNotExists: Create database if not exist.
CreateDatabaseIfNotExists: is the default initializer, so it is executed here too. As a result, EF attempts to query the existence of the mapped tables and views, database migration history, and entity data model info, etc. Apparently, here AdventureWorks database does not have the migration and entity data model info; recreating database is not needed as well. So the database initialization can be turned off, by setting the initializer to NullDatabaseInitializer:
In EF, mapping views are not the views inside the database. They are System.Data.Entity.Infrastructure.MappingViews.DbMappingView instances, representing the mapping information for entity sets. Instead of generate these instances at runtime, pre-generate them at design time can improve the performance. Microsoft provides a Visual Studio extension, EF Power Tools, to generate these code. It needs to be modified to installed with the latest Visual Studio. After the installation, just right click the code file containing the database mapping (the class derived from DbContext), and in the menu click EF => Generate Views, it generates a file, containing the code to create the DbMappingView instances.
After the object-relational mapping metadata is initialized, they are cached, so that the initialization only happens once for the AppDomain. EF/Core also implement cache for entities and query translation.
As fore mentioned, by default, the entities queried from repository are cached and tracked. This behavior can be demonstrated by the following example:
In this example, the first query reads data from the repository and materialize the data to a category entity, and update its Name. Then the repository is queried again by Name. After reading the data, EF/Core find the primary key is the same as the cached entity, so EF/Core do not materialize the data just read, it reuses the previous category entity. Performance can be improved by skipping the materialization, but tricky result can happen. The second query reads entity with Name “Bikes”, but the query result entity has Name “Cache”. This is not only LINQ to Entities queries’ behavior, When DbSet directly executes SQL query in the repository, EF/Core still uses cached entities.
Entity is not cached when tracking is turned off, or entity is not queried from the repository. Each of the following queries materializes a new entity:
DbSet.Find accept the primary keys and returns an entity. Calling Find can improve the performance, because it looks up cache before querying the repository:
Here when Find is called, entity with the specified primary key is already queries, cached and tracked, so Find directly returns the cached entity, without repository query or data materialization.
As discussed in the query translation part, EF/Core translate a LINQ to Entities query in 2 steps:
Compile LINQ expression tree to database expression tree
Generate SQL from database expression tree
To improve the performance, EF Core caches the query translations in a Microsoft.Extensions.Caching.Memory.MemoryCache. Before processing a LINQ query, EF Core computes the cache key, and looks up the cache. If the translation is found, then it reuses the translation; if not, it translates the query, and add the translation to cache.. For SQL database queries, the cache key’s hash code is computed with the the hash code of the following values:
The LINQ query expression tree. The LINQ query expression tree is scanned recursively, the hash code of the nodes and APIs represented by the expression tree nodes are used to compute the hash code of the entire expression tree.
DbContext.Model
DbContext.ChangeTracker.QueryTrackingBehavior, which is an enumeration of TrackAll or NoTracking
A Boolean value that indicates whether the query is executed asynchronously
SqlServerOptionsExtension.UseRelationalNulls, which can be specified with SqlServerDbContextOptionsBuilder.UseRelationalNulls
SqlServerOptionsExtension.RowNumberPaging, which can be specified with SqlServerDbContextOptionsBuilder.UseRowNumberForPaging
EF always compiles the LINQ expression tree to database expression tree, then cache the SQL generation in a dictionary. For example:
EF generates the cache key with the following values:
The database expression tree’s string representation. Here it is: [Filter](BV’LQ1’=([Scan](AdventureWorks.Products.collection[Product(Nullable=True,DefaultValue=)]))([>=](FUNC<Edm.Length(In Edm.String(Nullable=True,DefaultValue=,MaxLength=,Unicode=,FixedLength=))>((Var(‘LQ1’)[.]Name)),@p__linq__0.Int32(Nullable=False,DefaultValue=))))
The parameters’ string representation: @@1p__linq__0.Int32
The path of the Include query. Here it is ProductSubcategory
The query’s MergeOption, which is AppendOnly by default.
These first LINQ query builds expression trees with a ConstantExpression node representing int value 1. The second query builds similar expression tree but with a different ConstantExpression node representing int value 10. So these LINQ expression trees are different. In EF Core, the first expression tree’s translation cannot be reused for the second query.
In EF, their compiled database expression trees are different too, with 2 different DbConstantExpression nodes. The 2 database expression trees’ string representations are:
As discussed in the C# features chapter, the predicate lambda expressions capture variable minLength with the closure syntactic sugar. The above code is compiled to:
In the predicates, the outer variable access is compiled to field access. So in the LINQ queries’ expression trees, there are no longer ConstantExpression nodes representing different int values, but MemberExpression nodes representing the same field. As a result, the 2 query’s LINQ expression trees are identical, and the translation is reused.
In EF, if a query method accepts values instead of lambda expression, this parameterization approach does not work. For example, Skip and Take accept int values as parameters:
The above LINQ queries access to variable skip and take, but these variable access are also represented by ConstantExpression nodes. So their expression trees are different, and converted database command trees are different, and their translations cannot be reused for each other. To resolve this problem, EF provides a lambda expression version for these methods:
These LINQ queries have MemberExpression nodes again. EF can convert them to identical parameterized database expression trees. Now their translations can be reused for each other.
Generally, for long running I/O bound operation, asynchrony can improve the application responsiveness and service scalability. EF/Core support asynchrony for database CRUD operations, and these async APIs are very easy to use with C# async/await keywords. Please notice this does not mean all the synchronous API calls must be replaced by asynchronous API calls, the application must be tested to identify which API has better performance.
For LINQ to Entities queries, EF/Core start to read the data when values are pulled from IQueryable data source, for example:
Pull the values from the query represented by IQueryable.
Call a query method to return a single value from the IQueryable, like First, etc..
Call a LINQ to Objects query method to return a new collection, like ToArray, etc..
For these operations and APIs, async parities are provided as IQueryable extension methods. In EF Core, these async query APIs are also provided as extension methods in Microsoft.EntityFrameworkCore.EntityFrameworkQueryableExtensions:
async iteration method: ForEachAsync asynchronously pulls each value from IQueryable data source and call the specified function.
async methods to return a new collection: ToArrayAsync, ToDictionaryAsync, ToListAsync
In EF, these methods are provided in System.Data.Entity.QueryableExtensions, and LastAsync, LastOrDefaultAsync are not provided, since EF does not support Last, LastOrDefault.
For data changes, DbContext.SaveChangesAsync is provided as a parity of DbContext.SaveChanges. For example:
.FirstAsync(entity=> entity.Name.Contains("Bike")); // Async version of First.
9
subcategory.Name.WriteLine();
10
11
Product[] products=await adventureWorks.Products
12
.Where(product=> product.ListPrice <=10)
13
.ToArrayAsync(); // Async version of ToArray.
14
15
adventureWorks.Products.RemoveRange(products);
16
(await adventureWorks.SaveChangesAsync()).WriteLine(); // Async version of SaveChanges.
17
}
Transactions and connection resiliency with asynchronous operations#
These async APIs work in EF/Core transaction. In this tutorial, connection resiliency is enabled because cloud SQL database is used, so call the retry strategy’s ExecuteAsync method:
TransactionScope by default does not support across thread transaction flow. Using the the async/await syntactic sugar for TransactionScope causes InvalidOperationException: A TransactionScope must be disposed on the same thread that it was created.. To resolved this, Since .NET 4.5.1, a new constructor for TransactionScope is provided to explicitly enable transaction flow across thread continuations:
EF/Core also provide async APIs for other database operations. In the previous concurrency part, a DbContext.SaveChanges overload is implemented to handle concurrency conflict, refresh entity, and retry saving changes. Here a async version can be implemented easily:
With the async/await syntactic sugar, the implementation looks very similar to the synchronous version. The following are the SaveChangesAsync overloads to accept RefreshConflict enumeration:
Instead of calling the previously defined Refresh extension method to refresh the DbEntityEntry instance, here a async method RefreshAsync is called to refresh asynchronously: