Search code examples
mysqlsqlrleft-joinquery-performance

How do I make my query using left join to be faster?


I have three tables: users (id), projects (id_project, owner_id) and followers (repo_id, user_id). I would like count how many followers one project has. I would like to return all user projects, with and without followers.

Where: id = owner_id = user_id id_project = repo_id

I report more than 1,000 users in my query. I did this:

 rs = dbSendQuery(mydb, "select p.id_project, p.owner_id, count(f.user_id) from users u left outer join projects p on p.owner_id = u.id and u.id in (123, 526, 852) left outer join followers f on p.id_project = f.repo_id group by p.id;")     

The query is too slow. Could anyone give me any suggestions to make the query faster? Am I doing something wrong?

Maybe, I can break into two queries, but how do I get the result of the first (which are the projects of the users) and add in the second query (where I would have the number of followers of the projects) in R?

I am using R and mysql.

Regards, Thaciana


Solution

  • Sometimes switching to a correlated subquery can speed such queries:

    select p.id_project, p.owner_id, count(f.user_id)
    from users u left outer join
         projects p
         on p.owner_id = u.id and u.id in (123, 526, 852) left outer join
         followers f
         on p.id_project = f.repo_id
    group by p.id;    
    

    For this query, you want indexes on users(id), projects(owner_id, id_project), and followers(repo_id, user_id).

    I note that you are not really using the users table. So, this should do what you want:

    select p.id_project, p.owner_id, count(f.user_id)
    from projects p left outer join
         followers f
         on p.id_project = f.repo_id
    where p.owner_id in (123, 526, 852)
    group by p.id;   
    

    The same indexes should work on this query, although the one on users is obviously not needed.

    Next, in MySQL, correlated subqueries are sometimes faster than aggregation queries. So, you can try:

    select p.id_project, p.owner_id,
           (select count(*)
            from followers f
            where p.id_project = f.repo_id
           ) as num_followers
    from projects p        
    where p.owner_id in (123, 526, 852);