I have three tables: users (id), projects (id_project, owner_id) and followers (repo_id, user_id). I would like count how many followers one project has. I would like to return all user projects, with and without followers.
Where: id = owner_id = user_id id_project = repo_id
I report more than 1,000 users in my query. I did this:
rs = dbSendQuery(mydb, "select p.id_project, p.owner_id, count(f.user_id) from users u left outer join projects p on p.owner_id = u.id and u.id in (123, 526, 852) left outer join followers f on p.id_project = f.repo_id group by p.id;")
The query is too slow. Could anyone give me any suggestions to make the query faster? Am I doing something wrong?
Maybe, I can break into two queries, but how do I get the result of the first (which are the projects of the users) and add in the second query (where I would have the number of followers of the projects) in R?
I am using R and mysql.
Regards, Thaciana
Sometimes switching to a correlated subquery can speed such queries:
select p.id_project, p.owner_id, count(f.user_id)
from users u left outer join
projects p
on p.owner_id = u.id and u.id in (123, 526, 852) left outer join
followers f
on p.id_project = f.repo_id
group by p.id;
For this query, you want indexes on users(id)
, projects(owner_id, id_project)
, and followers(repo_id, user_id)
.
I note that you are not really using the users
table. So, this should do what you want:
select p.id_project, p.owner_id, count(f.user_id)
from projects p left outer join
followers f
on p.id_project = f.repo_id
where p.owner_id in (123, 526, 852)
group by p.id;
The same indexes should work on this query, although the one on users
is obviously not needed.
Next, in MySQL, correlated subqueries are sometimes faster than aggregation queries. So, you can try:
select p.id_project, p.owner_id,
(select count(*)
from followers f
where p.id_project = f.repo_id
) as num_followers
from projects p
where p.owner_id in (123, 526, 852);