I am testing some queries over a bunch of materialized views. All of them have the same structure, like this one:
EXPLAIN ANALYZE SELECT mr.foo, ..., CAST(SUM(mr.bar) AS INTEGER) AS stuff
FROM foo.bar mr
WHERE
mr.a = 'TRUE' AND
mr.b = 'something' AND
mr.c = '12'
GROUP BY
mr.a,
mr.b,
mr.c;
Obviously the system is giving me a different query plan for each one of them, but if (and only if) a WHERE clause involves a boolean column (like in the examples), the planner always sorts the result set before finishing. Example:
Finalize GroupAggregate (cost=16305.92..16317.98 rows=85 width=21) (actual time=108.301..108.301 rows=1 loops=1)
Group Key: festivo, nome_strada, ora
-> Gather Merge (cost=16305.92..16315.05 rows=70 width=77) (actual time=108.279..109.015 rows=2 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Partial GroupAggregate (cost=15305.90..15306.95 rows=35 width=77) (actual time=101.422..101.422 rows=1 loops=3)
Group Key: festivo, nome_strada, ora
-> Sort (cost=15305.90..15305.99 rows=35 width=21) (actual time=101.390..101.395 rows=28 loops=3)
Sort Key: festivo
Sort Method: quicksort Memory: 25kB
-> Parallel Seq Scan on sft_vmv3_g3 mr (cost=0.00..15305.00 rows=35 width=21) (actual time=75.307..101.329 rows=28 loops=3)
Filter: (festivo AND ((nome_strada)::text = '16th St'::text) AND (ora = '12'::smallint))
Rows Removed by Filter: 277892
I am really curios about this kind of approach, but I still haven't found an explaination about this.
I'm curious why you wouldn't phrase the logic as:
SELECT true as a, 'something' as b, '12' as c, CAST(SUM(mr.bar) as INTEGER)
FROM foo.bar as mr
WHERE mr.a AND
mr.b = 'something' AND
mr.c = '12';
This an aggregation query (because of the SUM()
in the SELECT
) and does not have an explicit GROUP BY
. I think it should produce a more optimal execution plan. In addition, it will always return one row, even if no rows match the condition.