Search code examples
sqlsql-servergroup-bytop-n

Performance issue on selecting n newest rows in subselect


I have a database with courses. Each course contains a set of nodes, and some nodes contains a set of answers from students. The Answer table looks (simplified) like this:

Answer

id  | courseId |  nodeId |  answer
------------------------------------------------
 1  |   1      |   1     |  <- text ->
 2  |   2      |   2     |  <- text ->
 3  |   1      |   1     |  <- text ->
 4  |   1      |   3     |  <- text ->
 5  |   2      |   2     |  <- text ->
..  |  ..      |   ..    |  ..

When a teacher opens a course (i.e. courseId = 1) I want to pick the node that have received the most answers lately. I can do this using the following query:

with Answers as
(
   select top 50 id, nodeId from Answer A where courseId=1 order by id desc
)
select top 1 nodeId from Answers group by nodeId order by count(id) desc

or equally using this query:

select top 1 nodeId from 
    (select top 50 id, nodeId from Answer A where courseId=1 order by id desc)
    group by nodeId order by count(id) desc

In both querys the newest 50 answers (with the highest ids) are selected and then grouped by nodeId so I can pick the one with the highest frequency. My problem is, however, that the query is very slow. If I only run the subselect, it takes less than a second, and grouping 50 rows should be fast, but when I run the entire query it takes about 10 seconds! My guess is that sql server does the select and grouping first, and afterwards does the top 50 and top 1, which in this case leads to terrible performance.

So, how can I rewrite the query to be efficient?


Solution

  • You can add indexes to make your queries more efficient. For this query:

    with Answers as (
          select top 50 id, nodeId
          from Answer A
          where courseId = 1
          order by id desc
         )
    select top 1 nodeId
    from Answers
    group by nodeId
    order by count(id) desc;
    

    The best index is Answer(courseId, id, nodeid).