Search code examples
sqlsql-serverdatabase-performancequery-performance

Performance when using distinct and row_number pagination


I have a SQL something like this:

SELECT A,B,C,FUN(A) AS A FROM someTable

The problem is FUN() is a function which quite slow, so if there are a lot of records in someTable, there will be a big performance issue.

If we using a pagination, we can resolve this issue, we do the pagination like this:

SELECT * FROM(
SELECT A,B,C,FUN(A), Row_number()OVER( ORDER BY B ASC) AS rownum FROM someTable
)T WHERE T.rownum >=1 AND T.rownum<20

In this script, the FUN() will only execute 20 times so the performance is OK. But we need use alias to order by, so we can't write rownum inline, have to move to sub query or CTE, we chose CTE and it looks like this:

;WITH CTE AS (
   SELECT A,B AS alias,C,FUN(A) FROM someTable
)
SELECT * FROM(
SELECT *,Row_number()OVER( ORDER BY alias ASC) AS rownum FROM CTE
)T WHERE T.rownum >=1 AND T.rownum<20

So far we are going fine, we get pagination to solve performance issue, we solve the alias order problem, but somehow we need to add DISTINCT to the query:

 ;WITH CTE AS (
       SELECT DISTINCT A,B AS alias,C,FUN(A) FROM someTable
    )
    SELECT * FROM(
    SELECT *,Row_number()OVER( ORDER BY alias ASC) AS rownum FROM CTE
    )T WHERE T.rownum >=1 AND T.rownum<20

After this, the optimize of this SQL seems gone, the FUN() will execute many times as much as the records count in someTable, we get the performance issue again.

Basically we are blocked at here, is there any suggestions?


Solution

  • The problem is that in order to get distinct values, the database engine must run the fun(a) function on all the records being selected.

    If you do the fun(a) only in the final select, the distinct should not effect it, so it should run only on the final 20 records.

    I've changed the derived table you've used to another cte (but that's a personal preference - seems to me more tidy not to mix derived tables and ctes):

    ;WITH CTE1 AS (
        SELECT DISTINCT A,B AS alias,C
        FROM someTable
    ), 
    CTE2 AS
    (
        SELECT *, ROW_NUMBER() OVER(ORDER BY alias) As RowNum
        FROM CTE1
    )
    
    SELECT *, FUN(A)
    FROM CTE2
    WHERE RowNum >= 1 
    AND RowNum < 20
    

    Please note that since the fun function is not deterministic you might get results that are different from your original query - so before adapting this solution compare the results first.