Search code examples
sqljoinsqlitedistinct

Select distinct pairs joining a table to itself in sql


I have a table with two rows: IMDB_ID and Actor. I am trying to find the pairs of actors who co-stared in 3 or more movies. The pairs of names should be unique, meaning that ‘actor A, actor B’ and ‘actor B, actor A’ are the same pair, so only one of them should appear. Here's a few lines of the table, but not the whole thing:

IMDB_ID     ACTOR      
----------  -----------
tt0111161   Tim Robbins
tt0111161   Morgan Free
tt0111161   Bob Gunton 
tt0111161   William Sad
tt0111161   Clancy Brow
tt0111161   Gil Bellows
tt0111161   Mark Rolsto
tt0111161   James Whitm
tt0111161   Jeffrey DeM
tt0111161   Larry Brand
tt0111161   Neil Giunto
tt0111161   Brian Libby
tt0111161   David Prova
tt0111161   Joseph Ragn
tt0111161   Jude Ciccol
tt0068646   Marlon Bran
tt0068646   Al Pacino  

I tried:

SELECT DISTINCT movie_actor.actor, movie_actor.actor, COUNT(movie_actor.actor) AS occurrence 
   FROM movie_actor join movie_actor 
   ON movie_actor.imdb_id = movies.imdb_id 
   WHERE occurrence >= 3
     GROUP BY movie_actor.actor 
     ORDER BY occurrence DESC, movie_actor.actor ASC;

and received an operational error that said: ambiguous column name: movie_actor.actor


Solution

  • You should be using table aliases when using a self-join. However, your query has numerous problems, indicating that you need more practice in writing SQL. Some good habits:

    • Always use table aliases that are abbreviations of the table name.
    • Always qualify the column names
    • Remember to use a having clause when that is your intention.

    Here is SQL closer to what you want to do:

    SELECT ma.actor, ma2.actor, COUNT(*) AS occurrence 
    FROM movie_actor ma join
         movie_actor ma2
         ON ma.imdb_id = ma2.imdb_id 
    GROUP BY ma.actor, ma2.actor
    HAVING COUNT(*) >= 3
    ORDER BY occurrence DESC, ma.actor ASC;