Search code examples
sql-serverquery-optimization

SQL Server : optimize query. Lots of data


At the beginning I apologize for not being word-perfect in English.

I have two tables in my database, one contains questions, and second contains user answers for questions (for statistics).

TableA - questions

 ___________
| ID | Name |

TableB - Statistics

 ___________________________________
| ID | A_ID | U_ID| IsCorrect | Date|

User can answer one question several times, for example if we have question with ID = 1 and user (with id 2) which answered this question 4 times, we will add 4 rows to TableB:

 ___________________________________
| ID | A_ID | U_ID| IsCorrect | Date|
-------------------------------------
| 1  |  1   |  2  |   True    | Date|
| 2  |  1   |  2  |   False   | Date|
| 3  |  1   |  2  |   False   | Date|
| 4  |  1   |  2  |   True    | Date|

At the end, I have to query for questions (TableA) which user has not responded or answered but the smallest number of times (user is able to answer all questions).

My query (procedure) looks like:

Declare @max int

SET @max = (SELECT TOP 1 Count(A_ID) as QuestionCount  FROM [TableB]
Where User_id = 1
GROUP BY A_ID
ORDER BY QuestionCount DESC)

SELECT TOP 40 ID 
FROM [dbo].[TableA]
WHERE ID NOT IN (SELECT A_ID 
                 FROM [dbo].[TableB]
                 WHERE User_id = 1
                 GROUP BY A_ID
                 HAVING Count(A_ID) = @max)
ORDER BY NewID()

At the beggining I query for max occurence of question - If user answered some question 4 time @max will be 4.

In second query I query for question which weren't answered yet (in this occurence).

Question is: How to optimize this query (or maybe I should change my tables)? TableB for now has almost one million rows and beacause of that it isn't fast enough.


Solution

  • With SQL-Server (>=2008) you can use the OVER clause (https://msdn.microsoft.com/en-us/library/ms189461.aspx) which gives you grouped aggregats.

    EDIT: Just found your ORDER BY NewID() Why do you do this? NewID() is very bad to sort... 1 million is not so much in fact, but 1 million GUIDs without an index are a mass...