Search code examples
sqlpostgresqljoinself-joinsql-function

Self joining a table n times


I'm trying to find a biological "find core genome of a given set of organisms" problem. Essentially, given a list of organisms, find all genes that are common to them. To abstract away biology, you can think of find all colours that are favourite amongst a list of people (one person can have multiple favourite colours).

Database table would look like:

name | fav_colour
john | red
john | blue
john | green
jason | red
jason | blue
matt | red
matt | teal

User can specify a set of names like [john, jason] to get out [red, blue], or [john] to get [red, blue, green], or [john, jason, matt] to get [red].

I'm trying to solve this by doing n number of self joins where n is the number of names supplied.

Is there any way for me to do n number of self joins of tables to solve this problem for any number of names supplied? I tried to look for a way to do this via Postgres functions but can't figure out the n number of self joins part... any help or pointers towards the correct direction would be appreciated.

And no, unfortunately I can't change the schema to do these type of queries easier.


Solution

  • I don't think you need self-joins for this. You can use aggregation and a HAVING clause:

    with t(name, fav_colour) as (
          values ('john', 'red'),
                 ('john', 'blue'),
                 ('john', 'green'),
                 ('jason', 'red'),
                 ('jason', 'blue'),
                 ('matt', 'red'),
                 ('matt', 'teal')
         )
    select fav_colour
    from t
    where name in ('john', 'jason')
    group by fav_colour
    having count(*) = 2;
    

    The value "2" is the number of names in the list. Changing the IN list and the count are the only changes you need.