Search code examples
postgresqlcrosstab

Create Postgresql Crosstab Query with multiple categories


So far I have created a query that can give me the following result:

+------------+----------+-------------+-------+
|    date    | category | subcategory | count |
+------------+----------+-------------+-------+
| 2020-04-23 | One      | First       |     1 |
| 2020-04-23 | Two      | Second      |     1 |
| 2020-04-23 | Two      | First       |     3 |
| 2020-04-23 | Three    | Third       |     3 |
| 2020-04-23 | Three    | Second      |     1 |
| 2020-04-23 | Four     | Second      |     2 |
| 2020-04-23 | Five     | Third       |     3 |
| 2020-04-23 | Five     | Second      |     1 |
| 2020-04-23 | Five     | First       |     1 |
| 2020-04-23 | Six      | Third       |     1 |
| 2020-04-23 | Six      | Second      |     2 |
+------------+----------+-------------+-------+

I would like to turn it into the following but I can't figure it out:

+------------+----------+-------+--------+-------+
|    date    | category | First | Second | Third |
+------------+----------+-------+--------+-------+
| 2020-04-23 | One      |     1 |      0 |     0 |
| 2020-04-23 | Two      |     2 |      3 |     0 |
| 2020-04-23 | Three    |     0 |      1 |     3 |
| 2020-04-23 | Five     |     1 |      2 |     3 |
| 2020-04-23 | Six      |     0 |      2 |     1 |
+------------+----------+-------+--------+-------+

I tried the following, but it looks like you have to return a row, column, and value when using crosstab so it doesn't work:

SELECT *
FROM crosstab(
    $$
        SELECT date, category, subcategory, count(*)
        -- ...
        GROUP BY 1, 2, 3
    $$
)
AS ct(date date, category text, First int, Second int, Third int);

Is there a way to use multiple values for the row indicator when using crosstab, or will I have to find some other approach?


Solution

  • I found a solution, though it's not ideal: merge the first two with a unique character, get the crosstab query with a CTE, and split the resulting columns. It looks something like:

    WITH crosstb AS (
        SELECT *
        FROM crosstab(
            $$
            -- Concatenate date and category columns
            SELECT date || '_' || category, subcategory, count(*)
            -- ...
            GROUP BY 1, 2
            $$,
            -- Specify data columns to ensure null-count rows are included
            $$VALUES ('First'), ('Second'), ('Third')$$
        )
        AS ct(datecat text, First numeric, Second numeric, Third numeric)
    )
    SELECT 
        -- Split datecat column into separate again
        (regexp_split_to_array(datecat, '_'))[1]::date as Date, 
        (regexp_split_to_array(datecat, '_'))[2] as category, 
        COALESCE(First, 0),
        COALESCE(Second, 0),
        COALESCE(Third, 0)
    FROM crosstb;