Search code examples
postgresqlquery-optimization

Postgres - Is "not exists" slower than join?


I'm trying to locate the cause of a slow query that hits 3 tables with records ranging from a few hundred thousand to a several million

  • tango - 6166101
  • kilo_golf - 822805
  • three_romeo - 535782

Version

  • PostgreSQL 11.10

Current query

select count(*) as aggregate
from "tango"
where "lima" = juliet
  and not exists(select 1
                 from "three_romeo" quebec_seven_oscar
                 where quebec_seven_oscar.six_two = tango.six_two
                   and quebec_seven_oscar."romeo" >= six_seven
                   and quebec_seven_oscar."three_seven" in
                       ('partial_survey', 'survey_completed', 'wrong_number', 'moved'))
  and ("mike" <= '2021-02-03 13:26:22' or "mike" is null)
  and not exists(select 1
                 from "kilo_golf" as "delta"
                 where "delta"."to" = tango.six_two
                   and "two" = november
                   and "delta"."romeo" >= '2021-02-05 13:49:15')
  and not exists(select 1
                 from "three_romeo" as "four"
                 where "four".foxtrot = tango.quebec_seven_victor
                   and "four"."three_seven" in ('deceased', 'block_calls', 'block_all'))
  and "tango"."yankee" is null;

This is the analysis of the query in its current state - https://explain.depesz.com/s/Di51

It feels like the problematic area is in the tango table

  • tango.lima is equal to 'juliet' in the majority of records (low cardinality), we don't currently have an index on this
  • The long filter makes me wonder if I should create some sort of composite index?

After reading another post (https://stackoverflow.com/a/50148594/682754) tried removing the or "mike" is null and this helped quite a lot

https://explain.depesz.com/s/XgmB

Should I try and remove the not exists in favour of using joins?

Thanks


Solution

  • I don't think that using explicit joins will help you, since PostgreSQL converts NOT EXISTS into an anti-join anyway.

    But you spotted the problem: it is the OR. I would recommend that you use a dynamic query: add the cindition only if mikeis not NULL rather than having a static query with OR.