I have two tables:
poll_response (poll_id, option_id, user_id) (about 500,000 rows, with 500 unique polls, 1000 unique options, and 25000 unique users)
preferred_users (user_id) (about 800 rows)
I'd like to determine what percentage of the users who select each option are "preferred users" (i.e. those with high reputation). Other users can respond to the poll; in order to identify that the response came from a preferred user, the join is required to the preferred_users table.
Here's what I've got:
SELECT option_id, count(*) AS all_votes, count(preferred_users.user_id) AS preferred_votes
FROM response
LEFT JOIN preferred_users ON response.user_id = preferred_users.user_id
GROUP BY option_id
The query spits out a table like this:
| option_id | all_votes | preferred_votes
| 1 | 500 | 150
| 2 | 550 | 250
| 3 | 525 | 300
I can then do the math to determine the percentage.
The problem is that the query often times out -- meaning it takes well over a minute to complete.
Is there any way to get rid of the left join or otherwise optimize the query?
have you tried splitting it into two queries - one for the total and one for the preferred users? i suspect that what is causing it to run slowly is running through the entries in group counting the non-null entries (but you could see yourself by using explain).
in other words:
select option_id, count(*) from response group by option_id
select option_id, count(*) from response, preferred_users where response.user_id = preferred_user.id group by option_id
you could even join them:
select * from
(select option_id, count(*) as total from response group by option_id
left join
select option_id, count(*) as preferred from response, preferred_users where response.user_id = preferred_user.id group by option_id
using (option_id))
(not sure if i have the syntax right there, but you get the idea).
also, you have an index on the preferred_users.id column too, right? and a foreign key relation from one to the other? if not, try with that first.