Search code examples
postgresqlpivot-tabledynamic-sqlcrosstabpostgres-crosstab

Pivot table using crosstab and count


I have to display a table like this:

Year Month Delivered Not delivered Not Received
2021 Jan 10 86 75
2021 Feb 13 36 96
2021 March 49 7 61
2021 Apr 3 21 72

Using raw data generated by this query:

SELECT 
    year,
    TO_CHAR( creation_date, 'Month') AS month,
    marking,
    COUNT(*) AS count 
FROM invoices
GROUP BY 1,2,3

I have tried using crosstab() but I got error:

SELECT * FROM crosstab('
    SELECT 
        year,
        TO_CHAR( creation_date, ''Month'') AS month,
        marking,
        COUNT(*) AS count 
    FROM invoices
    GROUP BY 1,2,3
') AS ct(year text, month text, marking text)

I would prefer to not manually type all marking values because they are a lot.

ERROR:  invalid source data SQL statement
DETAIL:  The provided SQL must return 3 columns: rowid, category, and values.

Solution

  • 1. Static solution with a limited list of marking values :

    SELECT year
         , TO_CHAR( creation_date, 'Month') AS month
         , COUNT(*) FILTER (WHERE marking = 'Delivered') AS Delivered
         , COUNT(*) FILTER (WHERE marking = 'Not delivered') AS "Not delivered"
         , COUNT(*) FILTER (WHERE marking = 'Not Received') AS "Not Received"
    FROM invoices
    GROUP BY 1,2
    

    2. Full dynamic solution with a large list of marking values :

    This proposal is an alternative solution to the crosstab solution as proposed in A and B.

    The proposed solution here just requires a dedicated composite type which can be dynamically created and then it relies on the jsonb type and standard functions :

    Starting from your query which counts the number of rows per year, month and marking value :

    • Using the jsonb_object_agg function, the resulting rows are first aggregated by year and month into jsonb objects whose jsonb keys correspond to the marking values and whose jsonb values correspond to the counts.
    • the resulting jsonb objects are then converted into records using the jsonb_populate_record function and the dedicated composite type.

    First we dynamically create a composite type which corresponds to the ordered list of marking values :

    CREATE OR REPLACE PROCEDURE create_composite_type() LANGUAGE plpgsql AS $$
    DECLARE
      column_list text ;
    BEGIN
      SELECT string_agg(DISTINCT quote_ident(marking) || ' bigint', ',' ORDER BY quote_ident(marking) || ' bigint' ASC)
        INTO column_list
        FROM invoices ;
      
      EXECUTE 'DROP TYPE IF EXISTS composite_type' ;
      EXECUTE 'CREATE TYPE composite_type AS (' || column_list || ')' ;
    END ;
    $$ ;
    
    CALL create_composite_type() ;
    

    Then the expected result is provided by the following query :

    SELECT a.year
         , TO_CHAR(a.year_month, 'Month') AS month
         , (jsonb_populate_record( null :: composite_type
                                 , jsonb_object_agg(a.marking, a.count)
                                 )
           ).*
     FROM
        ( SELECT year
               , date_trunc('month', creation_date) AS year_month
               , marking
               , count(*) AS count
            FROM invoices AS v
           GROUP BY 1,2,3
        ) AS a
    GROUP BY 1,2
    ORDER BY month
    

    Obviously, if the list of marking values may vary in time, then you have to recall the create_composite_type() procedure just before executing the query. If you don't update the composite_type, the query will still work (no error !) but some old marking values may be obsolete (not used anymore), and some new marking values may be missing in the query result (not displayed as columns).

    See the full demo in dbfiddle.