Search code examples
databasef#entity-framework-corecomputation-expression

F# query: find rows with max value in each group


In my project I use:

  • F# with query workflow
  • Entity Framework Core 3.11
  • MS SQL

I have a problem similar to this one: SQL select only rows with max value on a column, but I'm wondering how to express the SQL presented in that question using F# query workflow:

SELECT a.id, a.rev, a.contents
FROM YourTable a
INNER JOIN (
    SELECT id, MAX(rev) rev
    FROM YourTable
    GROUP BY id
) b ON a.id = b.id AND a.rev = b.rev

Solution

  • Direct translation

    For a direct translation, the inner query is:

    let innerQuery =
        query {
            for inner in ctx.YourTable do
            groupBy inner.id into grp
            select (grp.Key, grp.Max(fun ent -> ent.rev))
        }
    

    And then, in theory, I think the full query should be:

    query {
        for outer in ctx.YourTable do
        join inner in innerQuery
            on ((outer.id, outer.rev) = inner)
        select outer
    }
    

    However, that doesn't work:

    Type mismatch when building 'ty': function type doesn't match delegate type. Expected
    
    'Microsoft.FSharp.Core.FSharpFunc`2[Program+YourTable,System.Tuple`2[System.Int32,System.Int32]]'
    
    , but received type
    
    'Microsoft.FSharp.Core.FSharpFunc`2[Program+YourTable,Microsoft.FSharp.Linq.RuntimeHelpers.AnonymousObject`2[System.Int32,System.Int32]]'.
    

    I could be wrong, or it could be a bug/limitation. Perhaps someone has a workaround.

    Alternate translation

    However, if you'll accept a slightly different translation, it does work:

    query {
        for outer in ctx.YourTable do
        where (
            query {
                for inner in ctx.YourTable do
                groupBy inner.id into grp
                exists (grp.Key = outer.id && grp.Max(fun ent -> ent.rev) = outer.rev)
            })
        select outer
    }
    

    The generated SQL is:

    SELECT [y].[id], [y].[rev], [y].[contents]
    FROM [YourTable] AS [y]
    WHERE EXISTS (
        SELECT 1
        FROM [YourTable] AS [y0]
        GROUP BY [y0].[id]
        HAVING ([y0].[id] = [y].[id]) AND (MAX([y0].[rev]) = [y].[rev]))
    

    Output is:

    { id = 2
      rev = 1
      contents = "..." }
    { id = 1
      rev = 3
      contents = "..." }
    

    Note that I had to set a composite primary key of id, rev when building the model:

    override __.OnModelCreating(modelBuilder: ModelBuilder) =
        modelBuilder.Entity<YourTable>()
            .HasKey([| "id"; "rev" |]) |> ignore
    

    Full code is here.