Search code examples
postgresqlpivot-table

PostgreSQL Crosstab Query: return and sql tuple descriptions are incompatible


I have a query that should pivot a table with large number of values in a certain column that should be transformed into the column names of the pivoted table, i.e.,

   SELECT *
   FROM   crosstab(
   $$SELECT user_id, item_id, rating
   FROM   tbl
   ORDER  BY 1,2$$  
   ) AS ct ("user_id" varchar(50), "select distinct item_id from tbl ORDER BY 1" varchar(50)[]);

This attempt results in an error: return and sql tuple descriptions are incompatible. I am sure that the type of user_id and item_id columns is varchar(50). However, I am not sure what should be the type of the second argument that is a query.

This query has been generated following the example given in the first answer at PostgreSQL Crosstab Query. It mentions that the second argument can be a query.

Any help to fix this problem would be great. Thank you.

Edit:

For the purpose of providing MNE, the table tbl can be created by:

   CREATE TABLE tbl (
   user_id   varchar(50)
 , item_id   varchar(50)
 , rating    integer
);

INSERT INTO tbl VALUES 
  ('A', 'item_0', 1), ('A', 'item_3', 2)
, ('B', 'item_1', 4), ('B', 'item_2', 5)
                    , ('C', 'item_0', 3);

The desired outcome is:

user_id     item_0     item_1     item_2     item_3
A           1                                2
B                      4          5
C           3

Note that, in the actual scenario, the number of distinct values in the item_id column of table tbl is about 2000.


Solution

  • crosstab is an overloaded function and it seems that the version you're after is the one that takes two string arguments: crosstab ( source_sql text, category_sql text ) → setof record.

    So your query should look something like (I can't test it since your questions doesn't provide a sample input and output):

    select * from crosstab(
      -- 1st query: source_sql
      'SELECT user_id, item_id, rating FROM tbl ORDER BY 1,2',
    
      -- 2nd query: category_sql
      'select distinct item_id from tbl ORDER BY 1'
    ) AS (user_id varchar(50), ...);
    

    The two query strings are actually arguments to the crosstab call, not part of the specifier situated after the AS keyword.


    Edit: Here's a version of the query that works on your sample data, but it also shows that the number of categories needs to be fixed:

    select * from crosstab(
      'select * from tbl',
      'select distinct item_id from tbl order by item_id asc limit 4'
    ) as (
      user_id varchar(50),
      -- The number of items here must match the LIMIT clause above and you must
      -- make sure that the query will indeed return 4 item_id's.
      item_1 varchar(50),
      item_2 varchar(50),
      item_3 varchar(50),
      item_4 varchar(50)
    );