mysql ruby-on-rails join query-optimization derived-table

mysql derived tables, performance, alternative

I have the following tables,

link_books_genres, *table structure -> book_id,genre_id*
genres, *table structure -> genre_id,genre_name*

Given a set of book_ids, I want to form the following result,

result_set structure -> genre_id, genre_name, count(book_id).

I wrote this query,

SELECT one.genre_id,
       one.genre_name, 
       two.count 
FROM   genres as one,(SELECT genre_id,
                   count(book_id) as count 
                   FROM link_f2_books_lists GROUP BY genre_id) as two 
WHERE  one.genre_id = two.genre_id;

I don't know if that's the best solution, but I want this to be optimized if possible or if it is well formed, validated.

P.S. It's done with ruby on rails, so any rails oriented approach would also be fine.

Solution

Your query is not using the SQL-92 JOIN syntax but the older implicit join syntax. It's time (20 years now), you should start using it.

It's also not very good to use keywords like COUNT for aliases. You could use cnt or book_count instead:

SELECT one.genre_id,
       one.genre_name, 
       two.cnt 
FROM   
       genres AS one
     INNER JOIN
       ( SELECT genre_id,
                COUNT(book_id) AS cnt 
         FROM   link_f2_books_lists 
         GROUP BY genre_id
       ) AS two 
           ON one.genre_id = two.genre_id ;

MySQL usually is a bit faster with COUNT(*), so if book_id cannot be NULL, changing COUNT(book_id) to COUNT(*) will be a small performance improvement.

Off course you can rewrite the Join without the derived table:

SELECT one.genre_id,
       one.genre_name, 
       COUNT(*) AS cnt 
FROM   
       genres AS one
     INNER JOIN
       link_f2_books_lists AS two 
           ON one.genre_id = two.genre_id 
GROUP BY  one.genre_id ;

In both versions, you can change INNER JOIN to LEFT OUTER JOIN in order genres without any books (0 count) to be shown. But then do use COUNT(two.book_id) and not COUNT(*), for correct results.

The above versions (and yours) will not include those genres (that's one good reason to use the JOIN syntax, the change needed is very simple. Try that with your WHERE version!)

The LEFT JOIN versions can also be written like this:

SELECT one.genre_id,
       one.genre_name, 
       ( SELECT COUNT(*) 
         FROM   link_f2_books_lists AS two 
         WHERE  one.genre_id = two.genre_id 
       ) AS cnt 
FROM   
       genres AS one ;

Regarding performance, there is nothing better than testing yourself. It all depends on the version of MySQL you use (newer versions will have better optimizer that can select through more options to create an execution plan and possibly it will identify different versions as equivalent), the size of your tables, the indexes you have, the distribution of the data (how many different genres? how many books per genre on average? etc), your memory (and other MySQL) settings and probably many other factors that I'm forgetting now.

An advice is that an index on (genre_id, book_id) will be useful in most cases, for all the versions.

As a general advice, it's usually good to have both a (genre_id, book_id) and a (book_id, genre_id) index on the many-to-many table.