Search code examples
sqlpostgresqlpgadminself-joinmusicbrainz

SQL - Self Join to find two label with a release in common with a third label


I'm using PostgreSQL and PgAdmin 4 and I'm working with MusicBrainz database. I need to find the couples of label that have never released a release in common, but that they both have released a release with a third label (the same label for both).

In the database there are these tables: label (id, name..) id is primary key. release_label (id, release, label) id is primary key and label foreign key.

I've tried with self join but it is not working:

SELECT l1.name as label_1 , l2.name as label_2
FROM release_label as r1 INNER JOIN label as l1 ON r1.label=l1.id, label as l2
INNER JOIN (release_label as r2 LEFT JOIN release_label as r3
ON r3.label=r2.label)ON r2.label=l2.id WHERE r1.release != r2.release 
AND r1.label!= r3.label AND r1.release=r3.release
GROUP BY label_1,label_2 ORDER BY label_1,label_2

Thanks in advice.


Solution

  • This query gets pairs of labels that have never released anything in common:

    select l1.id as id1, l2.id as id2
    from label l1 cross join
         label l2 left join
         release_label rl1 
         on l1.id = rl1.label left join
         release_label rl2
         on l2.id = rl2.label and rl2.release = rl1.release
    where rl1.label is null and l1.id < l2.id;
    

    Now, you want a third label that has released with both . . .

    select ll.*, rl3_1.label as in_common
    from (select l1.id as id1, l2.id as id2
          from label l1 cross join
               label l2 left join
               release_label rl1 
               on l1.id = rl1.label left join
               release_label rl2
               on l2.id = rl2.label and rl2.release = rl1.release
          where rl1.label is null and l1.id < l2.id
         ) ll join
         release_label rl1
         on rl1.label = ll.id1 join
         release_label rl2
         on rl2.label = ll.id2 join
         release_label rl3_1
         on rl3_1.release = rl1.release join 
         release_label rl3_2
         on rl3_2.release = rl2.release and
            rl3_2.label = rl3_1.label;
    

    EDIT:

    An alternative method might be simpler:

    select l1.id, l2.id, l3.id as in_common_id
    from label l1 join
         label l2
         on l1.id < l2.id join
         label l3
         on l1.id <> l3.id and l2.id <> l3.id
    where -- have no releases in common
          not exists (select 1
                      from release_label rl1 join
                           release_label rl2
                           on rl1.release = rl2.release
                      where rl1.label = l1.id and rl2.label = l2.id
                     ) and
          -- l1 has a release with l3
          exists (select 1
                  from release_label rl1 join
                       release_label rl3
                       on rl1.release = rl3.release
                      where rl1.label = l1.id and rl3.label = l3.id
                 ) and
          -- l2 has a release with l3
          exists (select 1
                  from release_label rl2 join
                       release_label rl3
                       on rl2.release = rl3.release
                      where rl2.label = l2.id and rl3.label = l3.id
                 );
    

    The from clause generates all candidate trips of labels. The exists checks the three conditions you want to check. This is the version that I would use, because I think the logic is much easier to follow.

    In either of these queries, you can (of course) use select distinct on the first two ids to just get the pairs you are looking for.