Search code examples
mysqlquery-optimizationleft-joindatabase-optimization

Can left join be optimized in 'preferred users' MySQL query?


I have two tables:

poll_response (poll_id, option_id, user_id) (about 500,000 rows, with 500 unique polls, 1000 unique options, and 25000 unique users)

preferred_users (user_id) (about 800 rows)

I'd like to determine what percentage of the users who select each option are "preferred users" (i.e. those with high reputation). Other users can respond to the poll; in order to identify that the response came from a preferred user, the join is required to the preferred_users table.

Here's what I've got:

SELECT option_id, count(*) AS all_votes, count(preferred_users.user_id) AS preferred_votes
FROM response
LEFT JOIN preferred_users ON response.user_id = preferred_users.user_id
GROUP BY option_id

The query spits out a table like this:

| option_id | all_votes | preferred_votes
| 1         | 500       | 150
| 2         | 550       | 250
| 3         | 525       | 300

I can then do the math to determine the percentage.

The problem is that the query often times out -- meaning it takes well over a minute to complete.

Is there any way to get rid of the left join or otherwise optimize the query?


Solution

  • have you tried splitting it into two queries - one for the total and one for the preferred users? i suspect that what is causing it to run slowly is running through the entries in group counting the non-null entries (but you could see yourself by using explain).

    in other words:

    select option_id, count(*) from response group by option_id
    select option_id, count(*) from response, preferred_users where response.user_id = preferred_user.id group by option_id
    

    you could even join them:

    select * from 
      (select option_id, count(*) as total from response group by option_id
       left join
       select option_id, count(*) as preferred from response, preferred_users where response.user_id = preferred_user.id group by option_id
       using (option_id))
    

    (not sure if i have the syntax right there, but you get the idea).

    also, you have an index on the preferred_users.id column too, right? and a foreign key relation from one to the other? if not, try with that first.