Search code examples
postgresqlpivotcrosstab

Pivot in Postgresql with TRUE/FALSE markings


I wonder how to put several array values into the column names, with TRUE/FALSE values. I'll give you the concrete example:

What I have is repeated rows, with the last column duplicate because of different results:

DATE        ID    Species   Illness         Tag
20180101    001   Dog       Asthma          Mucus
20180101    001   Dog       Asthma          Noisy
20180101    001   Dog       Asthma          Respiratory
20180102    002   Cat       Osteoarthritis  Locomotor
20180102    002   Cat       Osteoarthritis  Limp
...
20180131    003   Bird      Avian Pox       Itchy

What I want to get is this:

DATE        ID    Species   Illness      Mucus  Noisy ... Limp  Itchy 
20180101    001   Dog       Asthma       TRUE   TRUE  ... FALSE FALSE
20180102    002   Cat       Osteoarth.   FALSE  FALSE ... TRUE  FALSE
...
20180131    003   Bird      Avian Pox    FALSE  FALSE ... FALSE TRUE

I tried with "crosstab" function just for the part of tags, but it gives me error of unexisting function:

 select * 
 from crosstab (
   'select c.id, tg."name"  
    FROM taggings t 
    join consultations c
      on c.id=t.taggable_id
    join tags tg 
      on t.tag_id=tg.id
    group by c.id, tg."name"'
 ) as final_result(dermatological BOOLEAN, behaviour BOOLEAN)

Btw. I have around 350 tags, so it's not the most optimal function :/

EDIT: Finally I add tablefunc extension, and I tried with crosstab(), but I got the following error:

Query execution failed Reason: SQL Error [22023]: ERROR: invalid source data SQL statement Detail: The provided SQL must return 3 columns: rowid, category, and values.

I'll try to find a solution and update it here, but in the meantime if someone knows how to solve it, please share :) Thanks!


After days of reading and trying suggested solutions, this worked for me:

What I did is to get 3 separate tables, and then join the first and the third to get the info I need, plus the tags as columns with the value 1/0 if the tag exists in the certain ID. One more edit => I didn’t actually need the date, so I based the tables on IDs of consultations.

TABLE 1: Get a table of all the columns you need grouped by IDs, and get all the tags one ID has.

ID    Species   Age      Illness         Tag
001   Dog        2       Asthma          Mucus
001   Dog        2       Asthma          Noisy
001   Dog        2       Asthma          Respiratory
002   Cat        5       Osteoarthritis  Locomotor
002   Cat        5       Osteoarthritis  Limp
...
003   Bird       1       Avian Pox       Itchy

TABLE 2: Get the Cartesian product that will cross all the consultations with a list of all distinct tags, and order them for crosstab() function. (crosstab function needs to have 3 columns; ID, tags and values)

With consultation_tags as
    (here put the query of the TABLE 1),
tag_list as
    (select tags."name"
    from tags
    join taggings t on t.tag_id = tags.id
    join consultations c on c.id = t.taggable_id a
    group by 1), —-> gets the list of all possible tags in the DB 
cartesian_consultations_tags as
    (select consultations_tags.id, tag_list.name,
     case when tag_list.name = consultations_tags.tag_name then 1
     else 0  --> "case" gets the value 1/0 if the tag is present in an ID
     end as tag_exists
    from
    consultations_tags
    cross join 
    tag_list)
select cartesian_consul_tags.id, cartesian_consul_tags.name, 
SUM(cartesian_consul_tags.tag_exists) --> for me, the values were duplicated, and so were tags
from cartesian_consul_tags
group by 1, 2
order by 1, 2

—> the order of tags is really important here, because you are the one who names the columns in crosstab function; it doesn’t transform certain tag to a column, it only transfers the value of that tag position, so if you mess the naming order, the values will not correspond correctly.

TABLE 3: Crosstab of the second table —> it pivots the cartesian product table, or in this case the TABLE 2.

SELECT * 
FROM crosstab(‘ COPY THE TABLE 2 ‘) --> if you have some conditions like “where species = ‘Dogs’”, you will need to put double apostrophe in the string value —> where species = ‘’Dogs’’
AS ct(id int4,”Itchy” int8,
“Limp” int8,
“Locomotor” int8,
“Mucus” int8,
“Noisy” int8) --> your tag list. You can prepare it in excel, so all the tags are in quotation marks and has corresponding datatype. The datatype of the tags has to be the same as the datatype of the “value” in the table 2 

FINALLY, the final table I wanted was to join the tables 1 and 3, so I have info I need from the consultation IDs, and a list of tags as columns with the values 0/1 if the tag is present in certain consultation.

with table1 as ( Copy the query of table1),
table3 as ( Copy the query of table3)
select *
from table1
join table3 on 
table1.id=table3.id 
order by 1

And the final table looks like this:

ID    Species   Illness      Mucus  Noisy ... Limp  Itchy 
001   Dog       Asthma       1      1     ... 0     0
002   Cat       Osteoarth.   0      0     ... 1     0
...
003   Bird      Avian Pox    0      0     ... 0     1  

Solution

  • Depending on how you display the results of the query, you might consider a different approach where you get all true/false flags per tag in a single JSONB column, rather than 350 dynamic columns.

    I am not sure if I understood your data model correctly, but from what I gathered I think it's something like this:

    create table tags (id int, tag text);
    create table consultations (id int, species text, illness text);
    create table taggings (taggable_id int, tag_id int);
    
    insert into tags 
      (id, tag)
    values
      (1, 'Mucus'),
      (2, 'Noisy'),
      (3, 'Limp'),
      (4, 'Itchy'),
      (5, 'Locomotor'),
      (6, 'Respiratory');
    
    insert into consultations
      (id, species, illness)
    values 
      (1, 'Dog', 'Asthma'),
      (2, 'Cat', 'Osteoarthritis'),
      (3, 'Bird', 'Avian Pox');
    
    insert into taggings 
      (taggable_id, tag_id)
    values 
      (1, 1), (1, 2), (1, 6), -- the dog
      (2, 5), (2, 3), -- the cat 
      (3, 4); -- the bird
    

    Then you can get a single JSON column using this query:

    select c.id, c.species, c.illness, 
           (select jsonb_object_agg(t.tag, tg.taggable_id is not null)
            from tags t 
              left join taggings tg 
                     on tg.tag_id = t.id 
                    and tg.taggable_id = c.id) as tags
    from consultations c;
    

    With the above sample data the query returns:

    id | species | illness        | tags                                                                                                    
    ---+---------+----------------+---------------------------------------------------------------------------------------------------------
     1 | Dog     | Asthma         | {"Limp": false, "Itchy": false, "Mucus": true, "Noisy": true, "Locomotor": false, "Respiratory": true}  
     2 | Cat     | Osteoarthritis | {"Limp": true, "Itchy": false, "Mucus": false, "Noisy": false, "Locomotor": true, "Respiratory": false} 
     3 | Bird    | Avian Pox      | {"Limp": false, "Itchy": true, "Mucus": false, "Noisy": false, "Locomotor": false, "Respiratory": false}
    

    An alternative way of writing the query is to use a lateral join:

    select c.id, c.species, c.illness, ti.tags
    from consultations c
      left join lateral (
        select jsonb_object_agg(t.tag, tg.taggable_Id is not null) as tags
        from tags t 
          left join taggings tg on tg.tag_id = t.id and tg.taggable_id = c.id
      ) as ti on true