Search code examples
sqlpostgresqljoinsumgenerate-series

generate_series() not working as expected with sum in PostgreSQL


I have some table called classification that contains classification_indicator_id.
I need to sum this ID and put in 1 day series.
I need to add around 20 columns (with another classification_indicator_id).
I modified a bit answer from previous question:

select
data.d::date as "data",
sum(c.classification_indicator_id)::integer as "Segment1",
sum(c4.classification_indicator_id)::integer as "Segment2",
sum(c5.classification_indicator_id)::integer as "Segment3"
from 
  generate_series(
    '2013-03-25'::timestamp without time zone,
    '2013-04-01'::timestamp without time zone,
    '1 day'::interval
) data(d)
left join classifications c on (data.d::date = c.created::date and c.classification_indicator_id = 3)
left join classifications c4 on (data.d::date = c4.created::date and c4.classification_indicator_id = 4)
left join classifications c5 on (data.d::date = c5.created::date and c5.classification_indicator_id = 5)
group by "data"
ORDER BY "data"

But still not working properly. sum for each row is to big, and growing when I add additional columns. In second table with 4 columns in segment1 for 2013-03-26 should be the same amount like in first table etc.

 With 3 column                      With 4 columns
data       | Segment1 | Segment2   data       | Segment1 | Segment2 | Segment3
--------------------------------   -------------------------------------------
2013-03-25 | 12       | 16         2013-03-25 | 12       | 16       | 20
--------------------------------   -------------------------------------------
2013-03-26 | 18       | 24         2013-03-26 | 108      | 144      | 180    

Solution

  • As commented under your previous answer, you are running into a "proxy cross join".
    I explained it in more detail in this related answer:
    Two SQL LEFT JOINS produce incorrect result

    Your query should work like this:

    SELECT d.created AS data
          ,c3.segment1
          ,c4.segment2
          ,c5.segment3
    FROM (
       SELECT generate_series('2013-03-25'::date
                             ,'2013-04-01'::date
                             ,interval '1 day')::date AS created
        ) d
    LEFT JOIN (
        SELECT created
              ,sum(classification_indicator_id)::integer AS segment1
        FROM   classifications
        WHERE  classification_indicator_id = 3
        GROUP  BY 1
        ) c3 USING (created)
    LEFT JOIN (
        SELECT created
              ,sum(classification_indicator_id)::integer AS segment2
        FROM   classifications
        WHERE  classification_indicator_id = 4
        GROUP  BY 1
        ) c4 USING (created)
    LEFT JOIN (
        SELECT created
              ,sum(classification_indicator_id)::integer AS segment3
        FROM   classifications
        WHERE  classification_indicator_id = 5
        GROUP  BY 1
        ) c5 USING (created)
    ORDER  BY 1;
    

    Assuming that created is a date, not a timestamp.

    Or, for an even faster query, since this has become a topic:

    SELECT d.created AS data
          ,count(classification_indicator_id = 3 OR NULL)::int * 3 AS segment1
          ,count(classification_indicator_id = 4 OR NULL)::int * 4 AS segment2
          ,count(classification_indicator_id = 5 OR NULL)::int * 5 AS segment3
    FROM (
       SELECT generate_series('2013-03-25'::date
                             ,'2013-04-01'::date
                             ,interval '1 day')::date AS created
        ) d
    LEFT   JOIN classifications c USING (created)
    GROUP  BY 1
    ORDER  BY 1;