mysql count correlated-subquery distinct-values

Subquery to count items, and then group them by a field of the main query without duplicates

I want to count the number of times a "child" appears in a N..N relationship, and group the results by a field of the "parent".

I'm having a hard time putting exact words on this, so let's say I have 3 tables: Movie, Actor, Play, where Play is the relation between Movie and Actor. An actor can play in a movie. It is possible some actors exist in the database, but have never played in any movie of the database.

In the Movie table, I have a genre.

I want to count the number of actors that have played per genre, without counting the same actor more than once per genre.

I'm currently using DISTINCT on the actor_id per play, which means an actor that has theoretically played several times in the same movie will appear once... but that is a non-sensical scenario (because the actor will appear only once per movie in the Play table), so it is useless, and not what I want. Any better idea?

I'd like to keep everything in a single query, because I am actually doing other subqueries to get other statistics per genre.

Here is what my query looks like, without the other subqueries:

SELECT
    movie.genre,

    SUM(
        SELECT COUNT(DISTINCT play.actor_id)
        FROM play
        WHERE play.movie_id = movie.id
    ) AS number_of_actors

    FROM movie

    GROUP BY movie.genre

Currently, if an actor has played in several movies, he will be counted several times.

Solution

Your problem is that you have no way of communicating distinct between each of the elements that is summed, so you will end up duplicating values. It's simpler to write this as a JOIN. I've used a LEFT JOIN in case a movie has no entries in the play table, in which case the COUNT will be 0.

SELECT m.genre
       COALESCE(COUNT(DISTINCT p.actor_id), 0) AS number_of_actors
FROM movie m
LEFT JOIN play p ON p.movie_id = m.id
GROUP BY m.genre