I'm trying to find a biological "find core genome of a given set of organisms" problem. Essentially, given a list of organisms, find all genes that are common to them. To abstract away biology, you can think of find all colours that are favourite amongst a list of people (one person can have multiple favourite colours).
Database table would look like:
name | fav_colour
john | red
john | blue
john | green
jason | red
jason | blue
matt | red
matt | teal
User can specify a set of names like [john, jason] to get out [red, blue], or [john] to get [red, blue, green], or [john, jason, matt] to get [red].
I'm trying to solve this by doing n number of self joins where n is the number of names supplied.
Is there any way for me to do n number of self joins of tables to solve this problem for any number of names supplied? I tried to look for a way to do this via Postgres functions but can't figure out the n number of self joins part... any help or pointers towards the correct direction would be appreciated.
And no, unfortunately I can't change the schema to do these type of queries easier.
I don't think you need self-joins for this. You can use aggregation and a HAVING
clause:
with t(name, fav_colour) as (
values ('john', 'red'),
('john', 'blue'),
('john', 'green'),
('jason', 'red'),
('jason', 'blue'),
('matt', 'red'),
('matt', 'teal')
)
select fav_colour
from t
where name in ('john', 'jason')
group by fav_colour
having count(*) = 2;
The value "2" is the number of names in the list. Changing the IN
list and the count are the only changes you need.