Search code examples
postgresqldatedate-rangegenerate-series

Generate date series by month between two dates and avrage by month in postgresql


I want to create a row for every month between two dates, the first day of every month should be the day of the start date or the first day of every month, and the last date should be the last day of every month or the end date, with average (if date start = 15, then the average should be 15/30) for my table.

input :

product_id | date_start | date_end
1          | 16-01-2020 | 15-03-2020
2          | 07-01-2020 | 22-04-2020

The result should be :

product_id | date_start | date_end    | average
1          | 16-01-2020 | 31-01-2020  | 0.5
1          | 01-02-2020 | 29-02-2020  | 1
1          | 01-03-2020 | 15-03-2020  | 0.5
2          | 07-01-2020 | 31-01-2020  | 0.76  -- (30-07)/30
2          | 01-02-2020 | 29-02-2020  | 1
2          | 01-03-2020 | 31-03-2020  | 1
2          | 01-04-2020 | 22-04-2020  | 0.76

I tried using generate series and date trunc and union

SELECT (date_trunc('month', dt) + INTERVAL '1 MONTH' ):: DATE AS date_start , 
(date_trunc('month', dt) + INTERVAL '2 MONTH - 1 day' ):: DATE AS date_end

FROM generate_series( DATE '2020-01-15', DATE '2020-05-21', interval '1 MONTH' ) AS dt
union select '2020-01-15' as date_start, 
(date_trunc('month', '2020-01-15'::date) + INTERVAL '1 MONTH - 1 day' ):: DATE AS date_end

union select (date_trunc('month', '2020-05-21'::date) ):: DATE AS date_start ,
'2020-05-21' AS date_end
order by date_start

To adding average I calculate the difference between two dates

SELECT (date_trunc('month', dt) + INTERVAL '1 MONTH' ):: DATE AS date_start , 
(date_trunc('month', dt) + INTERVAL '2 MONTH - 1 day' ):: DATE AS date_end,
((date_trunc('month', dt) + INTERVAL '2 MONTH - 1 day' ) - (date_trunc('month', dt) + INTERVAL '1 MONTH' ):: DATE )
FROM generate_series( DATE '2020-01-15', DATE '2020-05-21', interval '1 MONTH' ) AS dt

with this it seemed like I was hit a wall.


Solution

  • The following gives approximately the same result as you desired, only averages deviates. I believe this stems from an inconsistency in the your calculations where the dates are inclusive in some and excludes either start or end date in others, I was inclusive in all. The other area of difference being I used the actual number of days in the month for denominator calculating it instead of 30. This is necessary for Feb to ever have average 1, otherwise max would be 0.97, and full months having 31 days would average 1.03.

    with product_dates(product_id, date_start, date_end) as  
         ( values (1,'2020-01-16'::date,'2020-03-15'::date)
                , (2,'2020-01-07'::date,'2020-04-22'::date)
         ) 
    select product_id, start_date, end_date, round((end_date-start_date+1 ) * 1.0 / (eom-som+1),2) average   
     from (select product_id
                , greatest(date_start,dt::date) start_date 
                , least(date_end,  (dt+interval '1 month' -interval '1 day')::date) end_date
                , dt::date som
                , (dt+interval '1 month' -interval '1 day')::date eom     
           from product_dates 
           cross join generate_series(date_trunc('month', date_start)
                                     ,date_trunc('month', date_end) + interval '1 month' - interval '1 day' 
                                     ,interval '1 month'
                                     ) gs(dt)
          ) s1;
    

    The heart is the generate_series working directly with dates, notice the date manipulation to ensure I had first day and last day of month. Then in the outer portion of the quest I selected those dates or the parameter date or the generated one (greatest and least functions),