Search code examples
sqlpostgresqlinner-joinrelational-division

SQL Query to find list of values from rows in a table 'B' contained in groups of a table 'A'


create table A {
   id   text,
   val  text }

create table B {
   val0 text, 
   val1 text,
   val2 text }

id and val of table A contain each a large, but discrete number of values in an many-to-many relationship. Sample data looks like follows:

id val
one a
one b
one y
two a
two d
two e
two x
three c
four c
four f
four z
four g

The second table B contains combinations from the same set as those for val, for example:

val0 val1 val2
a b c
a d e
c f g
z f c

I am trying to find a query that returns those, and only those, id's of table A for which there are val that cover at least one entire row from table B.

In the example above, the query should return id "two", because that id-group contains all values from the second row in table B, i.e. "a", "d", and "e". It is irrelevant that "x" is contained in that group as well. The query also should return "four", as the values in that group cover now rows 3 and 4 in table B. The order in table A or table B is not important.

I managed to do the query as a procedure. What is a concise SQL query that works efficiently with tables with millions of rows?

Post scriptum: Both answers provided by @Ajax1234 and @GMB (second answer that does not introduce a pk) work well. However, the left-join based solution by @Ajax1234 was significantly more performant on the real dataset: it took about 9 seconds for @Ajax1234 solution vs. more than a minute for the solution by @GMB


Solution

  • Using a series of left joins:

    select distinct a.id 
    from B b 
    left join A a on b.val0 = a.val 
    left join A a1 on b.val1 = a1.val and a.id = a1.id 
    left join A a2 on b.val2 = a2.val and a2.id = a.id 
    where a.id is not null and a1.id is not null and a2.id is not null
    

    See fiddle.