I want to count the number of times a "child" appears in a N..N relationship, and group the results by a field of the "parent".
I'm having a hard time putting exact words on this, so let's say I have 3 tables: Movie, Actor, Play, where Play is the relation between Movie and Actor. An actor can play in a movie. It is possible some actors exist in the database, but have never played in any movie of the database.
In the Movie table, I have a genre.
I want to count the number of actors that have played per genre, without counting the same actor more than once per genre.
I'm currently using DISTINCT on the actor_id per play, which means an actor that has theoretically played several times in the same movie will appear once... but that is a non-sensical scenario (because the actor will appear only once per movie in the Play table), so it is useless, and not what I want. Any better idea?
I'd like to keep everything in a single query, because I am actually doing other subqueries to get other statistics per genre.
Here is what my query looks like, without the other subqueries:
SELECT
movie.genre,
SUM(
SELECT COUNT(DISTINCT play.actor_id)
FROM play
WHERE play.movie_id = movie.id
) AS number_of_actors
FROM movie
GROUP BY movie.genre
Currently, if an actor has played in several movies, he will be counted several times.
Your problem is that you have no way of communicating distinct between each of the elements that is summed, so you will end up duplicating values. It's simpler to write this as a JOIN
. I've used a LEFT JOIN
in case a movie has no entries in the play
table, in which case the COUNT
will be 0.
SELECT m.genre
COALESCE(COUNT(DISTINCT p.actor_id), 0) AS number_of_actors
FROM movie m
LEFT JOIN play p ON p.movie_id = m.id
GROUP BY m.genre