Search code examples
c#sqllinqsql-to-linq-conversion

SQL "IN" statement in linq query mistake, how resolve?


I have this query in SQL:

SELECT * 
FROM TableName
WHERE myData IN (SELECT MAX(myData) AS DATA_MAX  
                 FROM TableName 
                 GROUP BY id1, id2) 

I want replicate it in Linq (c#) - how can I do that?


Solution

  • Long story short - don't use LINQ, optimize the query and use a microORM like Dapper to map results to classes :

    var query = "Select * "
                "from ( select *, " +
                "       ROW_NUMBER() OVER (partition by id1,id2 order by mydata desc) AS RN " +
                "       From TableName ) T " +
                "where RN=1";
    var data = connection.Query<SomeType>(query);
    

    LINQ isn't a replacement for SQL. ORMs in general aren't meant to write reporting queries like this one.

    Reporting queries need a lot of optimization and usually have to change in production. You don't want to have to redeploy your application each time a query changes. In this case it's far better to create a view and map to it using a microOMR like Dapper.

    This specific query could require two table scans, one to calculate the maximum per id1,id2 and one to find the rows with matching mydata. The intermediate data would have to be spooled into tempdb too. If mydata is covered by an index, it may not be such an expensive query. If it isn't, all the data will be scanned twice.

    An alternative is to calculate the ranking of each row by mydata based on id1, id2. You can do this with one of the ranking functions like ROW_NUMBER, RANK, NTILE.

    Select * 
    from ( select *,
                  ROW_NUMBER() OVER (partition by id1,id2 order by mydata desc) AS RN
           From TableName) T 
    where RN=1
    

    You can use that query directly with Dapper or create a view and map your entities to the view, not the table itself.

    One option would be to crate a MyTableRanked view :

    CREATE VIEW MyTableRanked AS
    select *,
           ROW_NUMBER() OVER (partition by id1,id2 order by mydata desc) AS RN
    From TableName
    

    This would allow you to write :

    var query="Select * from MyTableRanked where RN=@rank";
    var data = connection.Query<SomeType>(query,new {rank=2});
    

    Allowing you to return the top N records per ID1,ID2 combination