Search code examples
mysqlsqlcountinner-join

SQL Finding similarities


"Determine if there are actors with the same first name who appeared in the same movie."

This is my task and I'm supposed to do that with subqueries and I just dont really know what else to do. I tried everything with group by, order by and having count but I just don't get to the point where I get the same first name actors with the same movie.

Maybe someone can help me? I am using Sakila Database

    SELECT 
a.first_name
,(a.last_name)
,a.actor_id
, f.title
FROM actor a 

JOIN film_actor fa ON fa.actor_id = a.actor_id
JOIN film f ON f.film_id = fa.film_id

JOIN(SELECT b.first_name, COUNT(*)
     FROM actor B
     GROUP BY b.first_name
     HAVING COUNT(*) > 1 ) b 
     ON a.first_name = b.first_name
     
     GROUP BY a.last_name
     HAVING COUNT(f.title) > 1
     
     
     
     ORDER BY a.first_name

Solution

  • How about this:

    SELECT f.title, f.film_id, a.first_name, a.last_name, a.actor_id
    FROM actor a 
    JOIN film_actor fa ON fa.actor_id = a.actor_id
    JOIN film f ON f.film_id = fa.film_id
    WHERE a.first_name IN (
             SELECT a2.first_name
             FROM actor a2
             JOIN film_actor fa2 ON fa2.actor_id = a2.actor_id
             JOIN film f2 ON f2.film_id = fa2.film_id
             WHERE a2.actor_id <> a.actor_id AND f2.film_id = f.film_id
    )
    ORDER BY f.title ASC, a.last_name ASC, a.first_name ASC
    

    Explaining query step-by-step

    SELECT the needed fields from the joined tables

    JOIN the necessary tables

    WHERE (here is the subquery) a.first_name is in the set of:

    firstnames of actors, different than the current actor (a2.actor_id <> a.actor_id) and the film is the same (f2.film_id = f.film_id)

    The subquery in where is a similar select with joins query as the parent query.

    PS:

    One can do variations on this basic query template:

    1. Eg film_id can be given as parameter, so one can find all actors with same name for specific film.
    2. Also one can group and count how many actors appeared in same film wih same name eg by grouping on film_id and counting.
    3. One can even optimise a bit the query by removing unnecessary joins (eg film.title may not be needed at all) and so on..

    The advantage of having single results returned (instead of tuples or aggregates) is that number of actors with same name in same film is not fixed and manipulating the results, eg by grouping and counting or getting further info for each actor, is easier.

    The price is a slightly more complex and potentialy slower query.