I have a crosstab()
query like the following:
SELECT *
FROM crosstab(
'SELECT row_name, extra1, extra2..., another_table.category, value
FROM table t
JOIN another_table ON t.field_id = another_table.field_id
WHERE t.field = certain_value AND t.extra1 = val1
ORDER BY row_name ASC',
'SELECT category_name FROM category_name WHERE field = certain_value'
) AS ct(row_name text, extra1 text, extra2 text, ...)
Simplified example, the actual query is really complex and contains important information. The above query returns N result rows after filtering with table.extra1 = val1
.
When I change the query as follows:
SELECT *
FROM crosstab(
'SELECT row_name, extra1, extra2..., another_table.category, value
FROM table t
JOIN another_table ON t.field_id = another_table.field_id
WHERE t.field = certain_value AND t.extra1 IN (val1, ...) --> more values
ORDER BY row_name ASC',
'SELECT category_name FROM category_name WHERE field = certain_value'
) AS ct(row_name text, extra1 text, extra2 text, ...)
WHERE extra1 = val1; --> condition on the result
Added more possible values table.extra1 IN (val1, ...)
and a final condition WHERE extra1 = val1
. Now I get fewer rows than from the original one. To make it worse, if I add yet more values to IN (val1, ...)
, I get yet fewer rows. Why is that?
extra1, extra2, ...
are "extra columns" in crosstab terminology.
The manual for the tablefunc module explains the rules:
It may also have one or more “extra” columns. The
row_name
column must be first. The category andvalue
columns must be the last two columns, in that order. Any columns betweenrow_name
andcategory
are treated as “extra”. The “extra” columns are expected to be the same for all rows with the samerow_name
value.
And further down:
The output
row_name
column, plus any “extra” columns, are copied from the first row of the group.
Bold emphasis on key parts by me.
You only sort by row_name
:
ORDER BY row_name ASC
Does not matter in the first example where you filter with:
WHERE ... t.extra1 = 'val1' -- single quotes by me
All input row have extra1 = 'val1'
anyway. But it matters in the second example where you filter with:
WHERE ... t.extra1 IN('val1', ...) --> More values
Now, the first bolded requirement above is violated for the extra column extra1
. While the sort order of the first input query is non-deterministic, resulting values for the "extra" column extra1
are picked arbitrarily. The more possible values for extra1
, the fewer rows will end up having 'val1': that's what you observed.
You can still make it work: to report extra1 = 'val1'
for every row_name
that has at least one of those, change the ORDER BY
to:
ORDER BY row_name, (extra1 <> 'val1')
Sorts 'val1' on top. Explanation for that boolean
expression (with links to more):
Other "extra" columns are still chosen arbitrarily while the sort order is not deterministic.
Crosstab basics: