Friday, October 15, 2021

EF Core 5.0: Split Queries

In this post, let's have a look at this nice feature "Split Queries" that got introduced with  EF Core 5.0.

Consider you have the following Models in your DbContext.

public class Category
{
    public int Id { getset; }

    public string Name { getset; }

    public ICollection<Product> Products { getset; }
}

public class Product
{
    public int Id { getset; }

    public string Name { getset; }

    public int CategoryId { getset; }

    public ICollection<Item> Items { getset; }
}

public class Item
{
    public int Id { getset; }

    public string Name { getset; }

    public int ProductId { getset; }
}

And if you do a query something like this,

List<Category> query = await context.Categories
    .
Include(x => x.Products)
        .
ThenInclude(x => x.Items)
    .
ToListAsync();
The generated query would be something like this.
SELECT [c].[Id], [c].[Name], [t].[Id], [t].[CategoryId], [t].[Name], [t].[Id0], [t].[Name0], [t].[ProductId]
FROM [Categories] AS [c]
LEFT JOIN (
    SELECT [p].[Id], [p].[CategoryId], [p].[Name], [i].[Id] AS [Id0], [i].[Name] AS [Name0], [i].[ProductId]
    FROM [Products] AS [p]
    LEFT JOIN [Item] AS [i] ON [p].[Id] = [i].[ProductId]
) AS [t] ON [c].[Id] = [t].[CategoryId]
ORDER BY [c].[Id], [t].[Id], [t].[Id0]
And at the same time, you might see a warning like this.

Compiling a query which loads related collections for more than one collection navigation either via 'Include' or through projection but no 'QuerySplittingBehavior' has been configured. By default Entity Framework will use 'QuerySplittingBehavior.SingleQuery' which can potentially result in slow query performance. To identify the query that's triggering this warning call 'ConfigureWarnings(w => w.Throw(RelationalEventId.MultipleCollectionIncludeWarning))'

So why is that? 

Let's look at the result of this query.
Result: SingleQuery
As you can see, I have only just one Category in the table, but the Category information is duplicated based on how many Products a particular Category has and how many Items each Product has. This duplication leads to the problem called "Cartesian Explosion"

Now how to address this?

As mentioned in the warning, by default Entity Framework will use 'QuerySplittingBehavior.SingleQuery'. We can change this behavior,
  • Per query or
  • Change the default globally
Per query, we can achieve this by adding AsSplitQuery(),
List<Category> query = await context.Categories
    .
Include(x => x.Products)
        .
ThenInclude(x => x.Items)
    .
AsSplitQuery()
    .
ToListAsync();
And now the generated query would change as follows,
SELECT [c].[Id], [c].[Name]
FROM [Categories] AS [c]
ORDER BY [c].[Id]

SELECT [p].[Id], [p].[CategoryId], [p].[Name], [c].[Id]
FROM [Categories] AS [c]
INNER JOIN [Products] AS [p] ON [c].[Id] = [p].[CategoryId]
ORDER BY [c].[Id], [p].[Id]

SELECT [i].[Id], [i].[Name], [i].[ProductId], [c].[Id], [p].[Id]
FROM [Categories] AS [c]
INNER JOIN [Products] AS [p] ON [c].[Id] = [p].[CategoryId]
INNER JOIN [Item] AS [i] ON [p].[Id] = [i].[ProductId]
ORDER BY [c].[Id], [p].[Id]
This would result in
Result: SplitQuery
Or we can change the QuerySplittingBehavior globally.
protected override void OnConfiguring(DbContextOptionsBuilder optionsBuilder)
{
    optionsBuilder
        .
UseSqlServer(@"Server=(localdb)\mssqllocaldb;Database=EFCore5;Trusted_Connection=True"options =>
        {
            options.UseQuerySplittingBehavior(QuerySplittingBehavior.SplitQuery);
        })
        .EnableSensitiveDataLogging()
        .LogTo(Console.WriteLine, LogLevel.Information);
}
Then for some of the queries, if you want to use SingleQuery behavior, you can do AsSingleQuery().
List<Category> query = await context.Categories
    .
Include(x => x.Products)
        .
ThenInclude(x => x.Items)
    .
AsSingleQuery()
    .
ToListAsync();
Now there are some important factors to consider when using Split Queries.
  • If you use Split Queries over One-to-One related entities, it has no effect. Those are always loaded via JOINs in the same query, as it has no performance impact.
  • When the query is split, that means multiple round trips to the database.
  • There can be concerns over data consistency if the data is being updated concurrently. Because it's multiple queries.
  • Most databases allow only a single query to be active at any given point (except for SQL Server with Multiple Active Result Sets, Sqlite). So all results from earlier queries must be buffered in your application's memory, which leads to increased memory requirements.
Hope this helps.
 
Happy Coding.

Regards,
Jaliya

No comments:

Post a Comment