I'm using PostgreSQL and PgAdmin 4 and I'm working with MusicBrainz database. I need to find the couples of label that have never released a release in common, but that they both have released a release with a third label (the same label for both).
In the database there are these tables: label (id, name..) id is primary key. release_label (id, release, label) id is primary key and label foreign key.
I've tried with self join but it is not working:
SELECT l1.name as label_1 , l2.name as label_2
FROM release_label as r1 INNER JOIN label as l1 ON r1.label=l1.id, label as l2
INNER JOIN (release_label as r2 LEFT JOIN release_label as r3
ON r3.label=r2.label)ON r2.label=l2.id WHERE r1.release != r2.release
AND r1.label!= r3.label AND r1.release=r3.release
GROUP BY label_1,label_2 ORDER BY label_1,label_2
Thanks in advice.
This query gets pairs of labels that have never released anything in common:
select l1.id as id1, l2.id as id2
from label l1 cross join
label l2 left join
release_label rl1
on l1.id = rl1.label left join
release_label rl2
on l2.id = rl2.label and rl2.release = rl1.release
where rl1.label is null and l1.id < l2.id;
Now, you want a third label that has released with both . . .
select ll.*, rl3_1.label as in_common
from (select l1.id as id1, l2.id as id2
from label l1 cross join
label l2 left join
release_label rl1
on l1.id = rl1.label left join
release_label rl2
on l2.id = rl2.label and rl2.release = rl1.release
where rl1.label is null and l1.id < l2.id
) ll join
release_label rl1
on rl1.label = ll.id1 join
release_label rl2
on rl2.label = ll.id2 join
release_label rl3_1
on rl3_1.release = rl1.release join
release_label rl3_2
on rl3_2.release = rl2.release and
rl3_2.label = rl3_1.label;
EDIT:
An alternative method might be simpler:
select l1.id, l2.id, l3.id as in_common_id
from label l1 join
label l2
on l1.id < l2.id join
label l3
on l1.id <> l3.id and l2.id <> l3.id
where -- have no releases in common
not exists (select 1
from release_label rl1 join
release_label rl2
on rl1.release = rl2.release
where rl1.label = l1.id and rl2.label = l2.id
) and
-- l1 has a release with l3
exists (select 1
from release_label rl1 join
release_label rl3
on rl1.release = rl3.release
where rl1.label = l1.id and rl3.label = l3.id
) and
-- l2 has a release with l3
exists (select 1
from release_label rl2 join
release_label rl3
on rl2.release = rl3.release
where rl2.label = l2.id and rl3.label = l3.id
);
The from
clause generates all candidate trips of labels. The exists
checks the three conditions you want to check. This is the version that I would use, because I think the logic is much easier to follow.
In either of these queries, you can (of course) use select distinct
on the first two ids to just get the pairs you are looking for.