Search code examples
sqlsql-optimization

Optimizing sub-queries and joins


I have 2 tables that can have many-to-many relations between them:

Person (pid, name, a,b) , 
Attributes (attribId, d,e)

The mapping is present in a separate table:

Mapping (mapId, pid, attribId)

The goal is to get all Person and Attributes values for a person who qualifies the filter criteria. The filter criteria is based on a column in the Attributes table. Eg - column d.

For example:

Person ->
(1,'person1','a1','b1')
(2,'person2','a1','b1')

Attributes ->
(1,'d1','e1')
(2,'d2','e1')
(3,'d3','e1')
(4,'d3','e2')

Mapping ->
(1,1,1)
(2,1,2)
(3,1,3)

After running the query ->
Result:
(1,'person1','a1','b1')(1,'d1','e1')
(1,'person1','a1','b1')(2,'d2','e1')
(1,'person1','a1','b1')(3,'d3','e1')

Query that i have been trying ->

select p.*, a.*
from
    Person p 
left outer join
    Mapping m 
        on p.pid=m.pid 
left outer join
    Attributes a 
        on m.attribId=a.attribId 
where
    p.pid in (select p1.pid
from
    Person p1 
left outer join
    Mapping m1 
        on p1.pid=m1.pid 
left outer join
    Attributes a1
        on m1.attribId=a1.attribId 
where
    a1.d = 'd1')

Similarly, I also have to discard Person entries that have a certain d value.

So, currently, the final query looks like this:

SELECT
  p.*,
  a.*
FROM Person p
LEFT OUTER JOIN Mapping m
  ON p.pid = m.pid
LEFT OUTER JOIN Attributes a
  ON m.attribId = a.attribId
WHERE p.pid IN (SELECT
  p1.pid
FROM Person p1
LEFT OUTER JOIN Mapping m1
  ON p1.pid = m1.pid
LEFT OUTER JOIN Attributes a1
  ON m1.attribId = a1.attribId
WHERE a1.d = 'd1')
AND p.pid NOT IN (SELECT
  p2.pid
FROM Person p2
LEFT OUTER JOIN Mapping m2
  ON p2.pid = m2.pid
LEFT OUTER JOIN Attributes a2
  ON m2.attribId = a2.attribId
WHERE a2.d = 'd5');

It feels like this query is inefficient since the same join is done at 3 places. Is there a way to reuse the join for all the sub-queries and make this more efficient?

sqlfiddle demo


Solution

  • You can get all persons satisfying the filter using:

    select m.pid
    from mapping m join
         attributes a
         on m.attribId = a.attribId and a.d = 'dS';
    

    You can get all person/attribute combinations using IN or EXISTS or a JOIN. Which is better depends on the database. But the idea is:

    select p.*, a.*
    from person p join
         mapping m
         on p.pid = m.pid join
         attributes a
         on m.attribId = a.attribId
    where p.pid in (select m.pid
                    from mapping m join
                         attributes a
                         on m.attribId = a.attribId and a.d = 'dS'
                   );
    

    I see no reason to have left joins for these queries.

    EDIT:

    If the filter criteria is based on multiple columns, then use group by and having for the subquery:

    select m.pid
    from mapping m join
         attributes a
         on m.attribId = a.attribId and a.d = 'dS'
    group by m.pid
    having sum(case when a.d = 'dS' then 1 else 0 end) > 0 and -- at least one of these
           sum(case when a.d = 'd1' then 1 else 0 end) = 0;    -- none of these