Search code examples
sqlpostgresqlsql-execution-planexplainquery-planner

Why does PostgreSQL sorts on a boolean WHERE condition?


I am testing some queries over a bunch of materialized views. All of them have the same structure, like this one:

EXPLAIN ANALYZE SELECT mr.foo, ..., CAST(SUM(mr.bar) AS INTEGER) AS stuff
    FROM foo.bar mr
    WHERE
        mr.a = 'TRUE' AND
        mr.b = 'something' AND
        mr.c = '12'
    GROUP BY
        mr.a,
        mr.b,
        mr.c;

Obviously the system is giving me a different query plan for each one of them, but if (and only if) a WHERE clause involves a boolean column (like in the examples), the planner always sorts the result set before finishing. Example:

Finalize GroupAggregate  (cost=16305.92..16317.98 rows=85 width=21) (actual time=108.301..108.301 rows=1 loops=1)
   Group Key: festivo, nome_strada, ora
   ->  Gather Merge  (cost=16305.92..16315.05 rows=70 width=77) (actual time=108.279..109.015 rows=2 loops=1)
         Workers Planned: 2
         Workers Launched: 2
         ->  Partial GroupAggregate  (cost=15305.90..15306.95 rows=35 width=77) (actual time=101.422..101.422 rows=1 loops=3)
               Group Key: festivo, nome_strada, ora
               ->  Sort  (cost=15305.90..15305.99 rows=35 width=21) (actual time=101.390..101.395 rows=28 loops=3)
                     Sort Key: festivo
                     Sort Method: quicksort  Memory: 25kB
                     ->  Parallel Seq Scan on sft_vmv3_g3 mr  (cost=0.00..15305.00 rows=35 width=21) (actual time=75.307..101.329 rows=28 loops=3)
                           Filter: (festivo AND ((nome_strada)::text = '16th St'::text) AND (ora = '12'::smallint))
                           Rows Removed by Filter: 277892

I am really curios about this kind of approach, but I still haven't found an explaination about this.


Solution

  • I'm curious why you wouldn't phrase the logic as:

    SELECT true as a, 'something' as b, '12' as c, CAST(SUM(mr.bar) as INTEGER)
    FROM foo.bar as mr
    WHERE mr.a AND
          mr.b = 'something' AND
          mr.c = '12';
    

    This an aggregation query (because of the SUM() in the SELECT) and does not have an explicit GROUP BY. I think it should produce a more optimal execution plan. In addition, it will always return one row, even if no rows match the condition.