Search code examples
sqlsqliteoptimizationwindow-functionsrow-number

How can I improve performance of this row_number() ranking window function query?


Table user_power_rank:

name type pk
id string 1
userid integer 0
power integer 0
atime integer 0

Indexes of this table:

index name columns
i_power_desc_atime_asc power desc, atime asc
i_id_power id, power

Query to get user rank:

SELECT * FROM (
  SELECT id,power,row_number() OVER (ORDER BY power DESC, atime ASC) ranking
  FROM user_power_rank
) WHERE id="the-data-id"

Run time of this query:

data count avg query time(ms)
10000 17.81
50000 101.32
100000 218.69

Performance is bad. Query plan:

id parent detail
2 0 CO-ROUTINE SUBQUERY 1
5 2 CO-ROUTINE SUBQUERY 3
9 5 SCAN user_rank_power USING INDEX i_power_desc_atime_asc
24 2 SCAN SUBUERY 3
63 0 SCAN SUBQUERY 1

How do I optimize performance?


Solution

  • Since your ranking is simply the global row_number(), it is equivalent to the number of rows that do sort "above". Instead of establishing the row_number() for all rows and then picking one result, we can count the number of rows that sort "above" that row for that exact sort-order; that number of rows is the value row_number() would have had.

    SELECT id, power,
           (SELECT COUNT(*)
            FROM user_power_rank AS inner
            WHERE (inner.power > user_power_rank.power
                   OR (inner.power = user_power_rank.power
                       AND inner.atime <= user_power_rank.atime))) AS ranking
    FROM user_power_rank
    WHERE id = ?;
    

    For 200,000 rows in user_power_rank the above query executes in 0.04s instead of 0.24s; for 1,000,000 rows it executes in 0.18s instead of 1.6s. Your mileage may vary depending on values for power are distributed.

    One needs to be careful about how NULL-values would have been treated by ORDER BY compared to >/<, if NULL-values in power/atime are allowed at all.

    Notice that in the construction of this query

    • inner.power > user_power_rank.power is equivalent to OVER (ORDER BY power DESC)
    • OR (inner.power = user_power_rank.power AND inner.atime <= user_power_rank.atime) is equivalent to OVER (..., atime ASC), but only for those values where power does not establish an order (because the values are equal).

    Also notice that with the original query there is ambiguity with respect to the value of ranking if there are multiple rows with the same (power, atime)-values; this would have been solved by OVER (ORDER BY power DESC, atime ASC, id) (notice id at the bottom of the sort-tree) to guarantee unambiguous order. This ambiguity is preserved with this query, which may cause it to return different result than OP's query. This is not an error, the exact sort-order had been ambiguous to begin with.