I am writing a replacement system which deals with a large number of images. I've migrated what are deemed to be "live" images to a new database, but there is a requirement to search the old database through the UI to allow users to pull different images across.
I've put a view in my database which queries the old database so that I can expose it through Entity Framework. The view works well, and despite having several million records I get results pretty snappily.
My problem is when I try to run a query using LINQ to Entities. My search is paged, and limits results to 100 per page.
I have an IQueryable which I use to filter results, but for Skip/Take I have to add an order by. Example below shows getting page 2 of results (Skip 100, Take 100).
if (firstId.HasValue)
query = query.Where(x => x.Id >= firstId.Value);
if (!string.IsNullOrEmpty(groupCode))
query = query.Where(x => x.GroupCode == groupCode);
var daResults = query
.OrderBy(x => x.Id)
.Skip(100)
.Take(100)
.ToList();
The SQL this generates is:
SELECT
[Project1].[Id] AS [Id],
[Project1].[FolderPath] AS [FolderPath],
[Project1].[Filename] AS [Filename],
[Project1].[IsFlagged] AS [IsFlagged],
[Project1].[IsHidden] AS [IsHidden],
[Project1].[TextField1] AS [TextField1],
[Project1].[TextField2] AS [TextField2],
[Project1].[TextField3] AS [TextField3],
[Project1].[TextField4] AS [TextField4],
[Project1].[GroupCode] AS [GroupCode],
[Project1].[Deleted] AS [Deleted],
[Project1].[Created] AS [Created],
[Project1].[CreatedBy] AS [CreatedBy],
[Project1].[Updated] AS [Updated],
[Project1].[UpdatedBy] AS [UpdatedBy]
FROM ( SELECT
[Extent1].[Id] AS [Id],
[Extent1].[FolderPath] AS [FolderPath],
[Extent1].[Filename] AS [Filename],
[Extent1].[IsFlagged] AS [IsFlagged],
[Extent1].[IsHidden] AS [IsHidden],
[Extent1].[TextField1] AS [TextField1],
[Extent1].[TextField2] AS [TextField2],
[Extent1].[TextField3] AS [TextField3],
[Extent1].[TextField4] AS [TextField4],
[Extent1].[GroupCode] AS [GroupCode],
[Extent1].[Deleted] AS [Deleted],
[Extent1].[Created] AS [Created],
[Extent1].[CreatedBy] AS [CreatedBy],
[Extent1].[Updated] AS [Updated],
[Extent1].[UpdatedBy] AS [UpdatedBy]
FROM [dbo].[view_ProEditBulkImageInfo] AS [Extent1]
WHERE
([Extent1].[Deleted] IS NULL)
AND ([Extent1].[Id] >= 300000)
) AS [Project1]
ORDER BY row_number() OVER (ORDER BY [Project1].[Id] ASC)
OFFSET 100 ROWS FETCH NEXT 100 ROWS ONLY
The issue seems to be that the first projection is fully evaluated (around 10 million records) which are then sorted before the top 100 rows are returned. The query execution plan shows the SORT cost as 92% of the batch.
This query takes around 30 seconds, just on the cusp of the timeout, so it's hit and miss as to whether it even returns.
I'm looking for some hints on how to speed this thing up, queries against the view are super quick (< 1 second).
I got to the end of writing this question, and then struck upon a slightly different approach. I thought I'd post anyway because it may be useful.
I've changed my code structure. I now do an extended .Take() first. This gets me all pages up to, and including, the page I want to return. Then I do the order by and skip to get only the page I want.
query = query
.Take((page.GetValueOrDefault(0) + 1) * recordCount.GetValueOrDefault(100));
// Now skip to the required page.
daResults = daResults
.OrderBy(x => x.Id)
.Skip(page.GetValueOrDefault(0) * recordCount.GetValueOrDefault(100))
.ToList();
The original Skip/Take results in the following SQL, which needs that internal query that was previously fully evaluated, and is slow:
ORDER BY row_number() OVER (ORDER BY [Project1].[Id] ASC)
OFFSET 100 ROWS FETCH NEXT 100 ROWS ONLY
Changing it, that internal query is much smaller. That internal sub-query uses SELECT TOP(200), which is lightning fast, and then applies the OFFSET etc. to the reduced results
I'm still only enumerating results (.ToList()) after all this has happened, so it all stays in the database and the results are now pretty much instant again.