Search code examples
sqlpostgresqlpivot-tablecrosstab

Unexpected effect of filtering on result from crosstab() query with multiple values


I have a crosstab() query similar to the one in my previous question:
Unexpected effect of filtering on result from crosstab() query

The common case is to filter extra1 field with multiples values: extra1 IN(value1, value2...). For each value included on the extra1 filter, I have added an ordering expression like this (extra1 <> valueN), as appear on the above mentioned post. The resulting query is as follows:

SELECT *
FROM crosstab(
 'SELECT row_name, extra1, extra2..., another_table.category, value
  FROM   table t
  JOIN   another_table ON t.field_id = another_table.field_id
  WHERE  t.field = certain_value AND t.extra1 IN (val1, val2, ...) --> more values
  ORDER  BY row_name ASC, (extra1 <> val1), (extra1 <> val2)', ... --> more ordering expressions
 'SELECT category_name FROM category_name WHERE field = certain_value'
) AS ct(extra1, extra2...)
WHERE extra1 = val1; --> condition on the result

The first value of extra1 included on the ordering expression value1, get the correct resulting rows. However, the following ones value2, value3..., get wrong number of results, resulting on less rows on each one. Why is that?

UPDATE:

Giving this as our source table (table t):

+----------+--------+--------+------------------------+-------+
| row_name | Extra1 | Extra2 | another_table.category | value |
+----------+--------+--------+------------------------+-------+
| Name1    | 10     | A      | 1                      | 100   |
| Name2    | 11     | B      | 2                      | 200   |
| Name3    | 12     | C      | 3                      | 150   |
| Name2    | 11     | B      | 3                      | 150   |
| Name3    | 12     | C      | 2                      | 150   |
| Name1    | 10     | A      | 2                      | 100   |
| Name3    | 12     | C      | 1                      | 120   |
+----------+--------+--------+------------------------+-------+

And this as our category table:

+-------------+--------+
| category_id | value  |
+-------------+--------+
| 1           | Cat1   |
| 2           | Cat2   |
| 3           | Cat3   |
+-------------+--------+

Using the CROSSTAB, the idea is to get a table like this:

+----------+--------+--------+------+------+------+
| row_name | Extra1 | Extra2 | cat1 | cat2 | cat3 |
+----------+--------+--------+------+------+------+
| Name1    | 10     | A      | 100  | 100  |      |
| Name2    | 11     | B      |      | 200  | 150  |
| Name3    | 12     | C      | 120  | 150  | 150  |
+----------+--------+--------+------+------+------+

The idea is to be able to filter the resulting table so I get results with Extra1 column with values 10 or 11, as follow:

+----------+--------+--------+------+------+------+
| row_name | Extra1 | Extra2 | cat1 | cat2 | cat3 |
+----------+--------+--------+------+------+------+
| Name1    | 10     | A      | 100  | 100  |      |
| Name2    | 11     | B      |      | 200  | 150  |
+----------+--------+--------+------+------+------+

The problem is that on my query, I get different result size for Extra1 with 10 as value and Extra1 with 11 as value. With (Extra1 <> 10) I can get the correct result size on Extra1 for that value but not in the case of 11 as value.

Here is a fiddle demonstrating the problem in more detail:

https://dbfiddle.uk/?rdbms=postgres_11&fiddle=5c401f7512d52405923374c75cb7ff04


Solution

  • All "extra" columns are copied from the first row of the group (as pointed out in my previous answer)

    While you filter with:

    .... WHERE extra1 = 'val1';
    

    ...it makes no sense to add more ORDER BY expressions on the same column. Only rows that have at least one extra1 = 'val1' in their source group survive.

    From your various comments, I guess you might want to see all distinct existing values of extra - within the set filtered in the WHERE clause - for the same unixdatetime. If so, aggregate before pivoting. Like:

    SELECT * 
    FROM   crosstab(
       $$
       SELECT unixdatetime, x.extras, c.name, s.value
       FROM  (
          SELECT unixdatetime, array_agg(extra) AS extras
          FROM  (
             SELECT DISTINCT unixdatetime, extra
             FROM   source_table   s
             WHERE  extra IN (1, 2)     -- condition moves here
             ORDER  BY unixdatetime, extra
             ) sub
          GROUP  BY 1
          ) x
       JOIN   source_table   s USING (unixdatetime)
       JOIN   category_table c ON c.id = s.gausesummaryid
       ORDER  BY 1
       $$
     , $$SELECT unnest('{trace1,trace2,trace3,trace4}'::text[])$$
    ) AS final_result (unixdatetime int
                     , extras int[]
                     , trace1 numeric
                     , trace2 numeric
                     , trace3 numeric
                     , trace4 numeric);
    

    Aside: advice given in the following related answer about the 2nd function parameter applies to your case as well:

    I demonstrate a static 2nd parameter query above. While being at it, you don't need to join to category_table at all. The same, a bit shorter and faster, yet:

    SELECT * 
    FROM   crosstab(
       $$
       SELECT unixdatetime, x.extras, s.gausesummaryid, s.value
       FROM  (
          SELECT unixdatetime, array_agg(extra) AS extras
          FROM  (
             SELECT DISTINCT unixdatetime, extra
             FROM   source_table
             WHERE  extra IN (1, 2)     -- condition moves here
             ORDER  BY unixdatetime, extra
             ) sub
          GROUP  BY 1
          ) x
       JOIN   source_table s USING (unixdatetime)
       ORDER  BY 1
       $$
    ,  $$SELECT unnest('{923,924,926,927}'::int[])$$
    ) AS final_result (unixdatetime int
                     , extras int[]
                     , trace1 numeric
                     , trace2 numeric
                     , trace3 numeric
                     , trace4 numeric);
    

    db<>fiddle here - added my queries at the bottom of your fiddle.