I have the following tables,
Given a set of book_ids, I want to form the following result,
result_set structure -> genre_id, genre_name, count(book_id).
I wrote this query,
SELECT one.genre_id,
one.genre_name,
two.count
FROM genres as one,(SELECT genre_id,
count(book_id) as count
FROM link_f2_books_lists GROUP BY genre_id) as two
WHERE one.genre_id = two.genre_id;
I don't know if that's the best solution, but I want this to be optimized if possible or if it is well formed, validated.
P.S. It's done with ruby on rails, so any rails oriented approach would also be fine.
Your query is not using the SQL-92 JOIN
syntax but the older implicit join syntax. It's time (20 years now), you should start using it.
It's also not very good to use keywords like COUNT
for aliases. You could use cnt
or book_count
instead:
SELECT one.genre_id,
one.genre_name,
two.cnt
FROM
genres AS one
INNER JOIN
( SELECT genre_id,
COUNT(book_id) AS cnt
FROM link_f2_books_lists
GROUP BY genre_id
) AS two
ON one.genre_id = two.genre_id ;
MySQL usually is a bit faster with COUNT(*)
, so if book_id
cannot be NULL
, changing COUNT(book_id)
to COUNT(*)
will be a small performance improvement.
Off course you can rewrite the Join without the derived table:
SELECT one.genre_id,
one.genre_name,
COUNT(*) AS cnt
FROM
genres AS one
INNER JOIN
link_f2_books_lists AS two
ON one.genre_id = two.genre_id
GROUP BY one.genre_id ;
In both versions, you can change INNER JOIN
to LEFT OUTER JOIN
in order genres without any books (0 count) to be shown. But then do use COUNT(two.book_id)
and not COUNT(*)
, for correct results.
The above versions (and yours) will not include those genres (that's one good reason to use the JOIN
syntax, the change needed is very simple. Try that with your WHERE
version!)
The LEFT JOIN
versions can also be written like this:
SELECT one.genre_id,
one.genre_name,
( SELECT COUNT(*)
FROM link_f2_books_lists AS two
WHERE one.genre_id = two.genre_id
) AS cnt
FROM
genres AS one ;
Regarding performance, there is nothing better than testing yourself. It all depends on the version of MySQL you use (newer versions will have better optimizer that can select through more options to create an execution plan and possibly it will identify different versions as equivalent), the size of your tables, the indexes you have, the distribution of the data (how many different genres? how many books per genre on average? etc), your memory (and other MySQL) settings and probably many other factors that I'm forgetting now.
An advice is that an index on (genre_id, book_id)
will be useful in most cases, for all the versions.
As a general advice, it's usually good to have both a (genre_id, book_id)
and a (book_id, genre_id)
index on the many-to-many table.