c#.net entity-framework linq linq-to-entities

Super slow LINQ Max() function execution

Trying to figure out why the LINQ Max function is actually way slower than the raw SQL query.

I am using a quite big SQLite file (300MB, 2-3 million rows).
.NET Core and EF Core.

var watch = Stopwatch.StartNew();
// LINQ Max function. This part takes around 2300 ms
ulong max = context.Messages.Select(m => m.Id).Max();
watch.Stop();
Console.WriteLine(watch.ElapsedMilliseconds + "ms - Max");

watch.Restart();
// Raw SQL. This part takes 1 ms
var max2 = context.Messages.FromSql("SELECT * FROM messages WHERE id=(SELECT MAX(id) from messages)").ToList();
watch.Stop();
Console.WriteLine(watch.ElapsedMilliseconds + "ms - MAX SQL");

Results (not on first run) :

2308ms - Max
1ms - MAX SQL

EDIT - Profiler

I turned on profiler and found out I had this problem with this code :

var watch = Stopwatch.StartNew();
var maxId = context.Messages.Max(m => m.Id);
watch.Stop();
Console.WriteLine(watch.ElapsedMilliseconds + "ms - Max ");

Since I have this warning :

Microsoft.EntityFrameworkCore.Query:Warning: The LINQ expression 'Max()' could not be translated and will be evaluated locally.

So it executes this weird SQL query :

SELECT "m"."Id"
FROM "Messages" AS "m",

Github issue above says :

In linq/C# when you call aggregate methods like Average/Min/Max on an enumerable of non-nullable type which is empty, it throws exception.

Thus I tried to fix the problem with this new code

var maxId = context.Messages.Max(m => (ulong?)m.Id);

But still got the warning. What am I missing?

EDIT2 - Alternative

Quick fix/alternative for the slow Max() function, not using raw-SQL statement :

var message = context.Messages.OrderByDescending(m => m.Id).FirstOrDefault();

This one does not give any warning.

Solution

The first statement (Select) doesn't execute anything against the DB so it should be 0ms. The behaviour of statements like Max() will vary by Database adapter.

I would look to try:

var maxId = context.Messages.Max(m => m.Id);

and for comparison:

var message1 = context.Messages.Where(m => m.Id == context.Messages.Max(x => x.Id)).Single();
//vs
var message2 = context.Messages.FromSql("SELECT * FROM messages WHERE id=(SELECT MAX(id) from messages)").Single();

If this is a "cold" test against a DbContext, I would also look to eliminate any warm-up time for the context on first run. I'd expect that would have been satisfied by the initial Select call, though the 0ms time would be suspicious unless something touched the DbContext before that call. To be sure:

var warmup = context.Messages.First();

//start timing here...
var maxId = context.Messages.Select(m => m.Id).Max();
maxId = context.Messages.Max(m => m.Id);

var message1 = context.Messages.Where(m => m.Id = context.Messages.Max(x => x.Id)).Single();
//vs
var message2 = context.Messages.FromSql("SELECT * FROM messages WHERE id=(SELECT MAX(id) from messages)").Single();

From there, if there is a difference between the last 2 calls, I would hook up a profiler (if possible) and inspect the queries sent to the database. The EF query (message1) should differ primarily by that it will explicitly list the columns in the entity rather than SELECT *.

Edit: Ok, after a bit of experimentation with this against a large, manufactured table of 3 million records.. Some interesting results. (Run against SQL Server)

var stopWatch = Stopwatch.StartNew();
var query = context.Messages.Select(x => x.MessageId).Max();
stopWatch.Stop();
Console.WriteLine(stopWatch.ElapsedMilliseconds + "ms. (Select.Max)" + "  " + query);
stopWatch.Restart();

query = context.Messages.Select(x => x.MessageId).Max(x=>x);
stopWatch.Stop();
Console.WriteLine(stopWatch.ElapsedMilliseconds + "ms. (Select.Max(x=>))" + "  " + query);
stopWatch.Restart();

query = context.Messages.Max(x => x.MessageId);
stopWatch.Stop();
Console.WriteLine(stopWatch.ElapsedMilliseconds + "ms. (Max(x=>))" + "  " + query);
stopWatch.Restart();

var result = context.Messages.Where(x => x.MessageId == context.Messages.Max(y => y.MessageId)).Single();
stopWatch.Stop();
Console.WriteLine(stopWatch.ElapsedMilliseconds + "ms. (get by Max(x=>))"  + "  " + result.MessageId);

This netted results of:

257ms. (Select.Max)  3000000
4ms. (Select.Max(x=>))  3000000
3ms. (Max(x=>))  3000000
210ms. (get by Max(x=>))  3000000

The interesting result was: .Select(x => x.MessageId).Max(); vs. .Select(x => x.MessageId).Max(x => x);

The first took 257ms while the second was 4ms. However, reversing the queries saw the results being the same. The .Select().Max(x => x) flavour took the 257ms, while Select.Max() took 4 ms.

Running a profiler against the runs showed that all queries didn't even register an execution cost. 0ms each. So this time was being taken by EF internally.

What is curious is that the first three scenarios to get the max ID run exactly the same SQL..

SELECT 
    [GroupBy1].[A1] AS [C1]
    FROM ( SELECT 
        MAX([Extent1].[MessageId]) AS [A1]
        FROM [dbo].[Messages] AS [Extent1]
    )  AS [GroupBy1]
go

Separating the runs into separate DbContext instances did not have an immediate impact. Again the first run took a performance hit, with the subsequent runs.

By moving the query that selects the entity with the criteria using Max() to be the first query run yielded still more interesting results:

486ms. (get by Max(x=>))  3000000
50ms. (Max(x=>))  3000000
14ms. (Select.Max(x=>))  3000000
2ms. (Select.Max)  3000000

It was notably slower running it first, but the following queries were fast-ish with no 200ms runs, but not as "fast" individually.

Diving a bit deeper I tried: .OrderByDescending(x => x.MessageId).Select(x => x.MessageId).Take(1).Single()

Executed first it had an execution time of 100ms as opposed to the 230+ms for the .Max() while when executed after a Max() the execution time was ~50ms.

On a long-shot, turning off lazy loading and proxy creation had no effect. Using .AsNoTracking() also had little to no impact on the query times.

While this difference in execution time is not really explainable given the SQL being run is identical, and query execution times do not explain any difference, it is clear that EF is doing something beyond the warm-up behind the scenes. In my test case against SQL Server, the execution time differences are noticeable but not enough to explain what is being seen in SQLite. I suspect something with the SQLite adapter may be introducing a much more expensive query so I would be investigating profiling options to confirm what queries are being executed.