Search code examples
postgresqlgroup-bycorrelated-subquery

sql select actors play together


There are a lot of variations of actor/movies queries. And I can't find correct approach, when trying to select from non-group columns: I have 3 tables film(film_id, title), actor(actor_id, first_name, last_name), film_actor(actor_id, film_id). So, I need to find 2 actors, which play together the most and show output as actor1_last_name, actor2_last_name, film_title for first 5 such films. What I'm doing is first, count appearance of actors:

select r1.actor_id as actor_a, r2.actor_id as actor_b,
count(r1.film_id) as casted_together
from film_actor r1 inner join film_actor r2 on r1.film_id = r2.film_id
and r1.actor_id > r2.actor_id
group by r1.actor_id, r2.actor_id
order by casted_together desc

This returns me table like

actor_a | actor_b| casted_together
Name      Name      7
...       ...       6

But once I try to add film.title to select it says I can't use it in case it is not in group section ((select title from film where film_id = r1.film_id) as film_title). subquery uses ungrouped column "r1.film_id" from outer query

How can I use results from above grouping and add there some outer column? In my case it is film.title.

Desired output:

actor_a_last_name | actor_b_last_name| film_title
Name                Name                Title
...                 ...                 ...

Say, George Clooney and Bread Pitt have the most number of films together (9) And result should show first 5 of 9 lines

    actor_a_last_name | actor_b_last_name| film_title
     Clooney                Pitt            Film 1
     Clooney                Pitt            Film 2
     ...                    ...             ...

Solution

  • Use an aggregate, e.g. string_agg() or array_agg():

    select 
        r1.actor_id as actor_a, 
        r2.actor_id as actor_b,
        count(r1.film_id) as casted_together,
        string_agg(f.title, ', ') as films
    from film_actor r1 
    join film_actor r2 
        on r1.film_id = r2.film_id
        and r1.actor_id > r2.actor_id
    join film f
        on r1.film_id = f.film_id
    group by r1.actor_id, r2.actor_id
    order by casted_together desc
    

    Use the query as a derived table and add actor names and a film title:

    select 
        a1.last_name,
        a2.last_name,
        f.title
    from (
        select 
            r1.actor_id as actor_a_id, 
            r2.actor_id as actor_b_id,
            count(r1.film_id) as casted_together,
            array_agg(r1.film_id) as film_ids
        from film_actor r1 
        join film_actor r2 
            on r1.film_id = r2.film_id
            and r1.actor_id > r2.actor_id
        join film f
            on r1.film_id = f.film_id
        group by r1.actor_id, r2.actor_id
        order by casted_together desc
        limit 1
        ) s
    join actor a1 on a1.actor_id = s.actor_a_id
    join actor a2 on a2.actor_id = s.actor_b_id
    join film f on f.film_id = any(s.film_ids)