Search code examples
sqlpostgresqlforeign-data-wrapper

Postgres foreign data wrapper issue with 'is null' where clause


I'm trying to build a report with a query, accessing different postgres DB via FDW.

And I'm guessing why does it work this way. First query without where clause is fine:

SELECT s.student_id, p.surname 
FROM rep_student s inner JOIN rep_person p ON p.id = s.person_id

But adding where caluse make this query a hundred times slower (40s vs 0.1s):

SELECT s.student_id, p.surname 
FROM rep_student s inner JOIN rep_person p ON p.id = s.person_id
WHERE s.learning_end_date IS NULL

Result for EXPLAIN VERBOSE:

Nested Loop  (cost=200.00..226.39 rows=1 width=734)
  Output: s.student_id, p.surname
  Join Filter: ((s.person_id)::text = (p.id)::text)
  ->  Foreign Scan on public.rep_student s  (cost=100.00..111.80 rows=1 width=436)
        Output: s.student_id, s.version, s.person_id, s.curriculum_flow_id, s.learning_start_date, s.learning_end_date, s.learning_end_reason, s.last_update_timestamp, s.aud_created_ts, s.aud_created_by, s.aud_last_updated_ts, s.aud_last_updated_by
        Remote SQL: SELECT student_id, person_id FROM public.rep_student WHERE ((learning_end_date IS NULL))
  ->  Foreign Scan on public.rep_person p  (cost=100.00..113.24 rows=108 width=734)
        Output: p.id, p.version, p.surname, p.name, p.middle_name, p.birthdate, p.info, p.photo, p.last_update_timestamp, p.is_archived, p.gender, p.aud_created_ts, p.aud_created_by, p.aud_last_updated_ts, p.aud_last_updated_by, p.full_name
        Remote SQL: SELECT id, surname FROM public.rep_person`

Result for EXPLAIN ANALYZE:

Nested Loop  (cost=200.00..226.39 rows=1 width=734) (actual time=27.138..38996.303 rows=939 loops=1)
  Join Filter: ((s.person_id)::text = (p.id)::text)
  Rows Removed by Join Filter: 15194898
  ->  Foreign Scan on rep_student s  (cost=100.00..111.80 rows=1 width=436) (actual time=0.685..4.259 rows=939 loops=1)
  ->  Foreign Scan on rep_person p  (cost=100.00..113.24 rows=108 width=734) (actual time=1.380..39.094 rows=16183 loops=939)
Planning time: 0.251 ms
Execution time: 38997.914 ms

Data count for tables is relatively small. Almost all rows in student table has NULL in learning_end_date column.

Student ~ 1000 rows. Persons ~ 15000.

It seems that Postgres has issues with filtering NULLs with FDW, because this query executes fast again:

SELECT s.student_id, p.surname 
FROM rep_student s inner JOIN rep_person p ON p.id = s.person_id
WHERE s.learning_start_date < current_date

Result for EXPLAIN VERBOSE:

Hash Join  (cost=214.59..231.83 rows=36 width=734)
  Output: s.student_id, p.surname
  Hash Cond: ((s.person_id)::text = (p.id)::text)
  ->  Foreign Scan on public.rep_student s  (cost=100.00..116.65 rows=59 width=436)
        Output: s.student_id, s.version, s.person_id, s.curriculum_flow_id, s.learning_start_date, s.learning_end_date, s.learning_end_reason, s.last_update_timestamp, s.aud_created_ts, s.aud_created_by, s.aud_last_updated_ts, s.aud_last_updated_by
        Filter: (s.learning_start_date < ('now'::cstring)::date)
        Remote SQL: SELECT student_id, person_id, learning_start_date FROM public.rep_student"
  ->  Hash  (cost=113.24..113.24 rows=108 width=734)
        Output: p.surname, p.id
        ->  Foreign Scan on public.rep_person p  (cost=100.00..113.24 rows=108 width=734)
              Output: p.surname, p.id
              Remote SQL: SELECT id, surname FROM public.rep_person`

Result for EXPLAIN ANALYZE:

Hash Join  (cost=214.59..231.83 rows=36 width=734) (actual time=41.614..46.347 rows=940 loops=1)
  Hash Cond: ((s.person_id)::text = (p.id)::text)
  ->  Foreign Scan on rep_student s  (cost=100.00..116.65 rows=59 width=436) (actual time=0.718..3.829 rows=940 loops=1)
        Filter: (learning_start_date < ('now'::cstring)::date)
  ->  Hash  (cost=113.24..113.24 rows=108 width=734) (actual time=40.812..40.812 rows=16183 loops=1)
        Buckets: 16384 (originally 1024)  Batches: 2 (originally 1)  Memory Usage: 921kB
        ->  Foreign Scan on rep_person p  (cost=100.00..113.24 rows=108 width=734) (actual time=2.252..35.079 rows=16183 loops=1)
Planning time: 0.208 ms
Execution time: 47.176 ms

Tried to add index on learning_end_date but didn't experience any effect.

What do I need to change to make query execute faster with 'IS NULL' where clause? Any ideas will be appreciated!


Solution

  • Your problem is that you do not have good table statistics on those foreign tables, so the row count estimates of the PostgreSQL optimizer are pretty arbitrary.

    That causes the optimizer to choose a nested loop join in the case you report as slow, which is an inappropriate plan.

    It is just by coincidence that this happens for a certain IS NULL condition.

    Collect statistics on the foreign tables with

    ANALYZE rep_student;
    ANALYZE rep_person;
    

    Then the performance will be much better.

    Note that while autovacuum automatically gathers statistics for local tables, it does not do that for remote tables because it does not know how many rows have changed, so you should regularly ANALYZE foreign tables whose data change.