Search code examples
sqlpostgresqlcasewindow-functions

create additional date after and before current row and create new column based on it


Lets say I have this kind of data

create table example
(cust_id VARCHAR, product VARCHAR, price float, datetime varchar);

insert into example (cust_id, product, price, datetime)
VALUES
('1', 'scooter', 2000, '2022-01-10'),
('1', 'skateboard', 1500, '2022-01-20'),
('1', 'beefmeat', 300, '2022-06-08'),
('2', 'wallet', 200, '2022-02-25'),
('2', 'hairdryer', 250, '2022-04-28'),
('3', 'skateboard', 1600, '2022-03-29')

I want to make some kind of additional rows, and after that make new column based on this additional rows

My expectation output will like this

cust_id total_price date is_active
1 3500 2022-01 active
1 0 2022-02 active
1 0 2022-03 active
1 0 2022-04 inactive
1 0 2022-05 inactive
1 300 2022-06 active
1 0 2022-07 active
2 0 2022-01 inactive
2 200 2022-02 active
2 0 2022-03 active
2 250 2022-04 active
2 0 2022-05 active
2 0 2022-06 active
2 0 2022-07 inactive
3 0 2022-01 inactive
3 0 2022-02 inactive
3 1600 2022-03 active
3 0 2022-04 active
3 0 2022-05 active
3 0 2022-06 inactive
3 0 2022-07 inactive

the rules is like this

  1. the first month when the customer make transaction is called active, before this transaction called inactive.
    ex: first transaction in month 2, then month 2 is active, month 1 is inactive (look cust_id 2 and 3)
  2. if more than 2 months there isnt transaction, the next month is inactive until there is new transaction is active.
    ex: if last transaction in month 1, then month 2 and month 3 is inactive, and month 4, month 5 inactive if month 6 there is new transaction (look cust_id 1 and 3)

well my first thought is used this code, but I dont know what the next step after it

select *, 
    date_part('month', age(to_date(date, 'YYYY-MM'), to_date(lag(date) over (partition by cust_id order by date),'YYYY-MM')))date_diff
from(
    select 
        cust_id,
        sum(price)total_price,
        to_char(to_date(datetime, 'YYYY-MM-DD'),'YYYY-MM')date
    from example
    group BY
        cust_id,
        date
    order by 
    cust_id, 
    date)test

I'm open to any suggestion


Solution

  • Try the following, an explanation within query comments:

    /* use generate_series to generate a series of dates
       starting from the min date of datetime up to the 
       max datetime with one-month intervals, then do a 
       cross join with the distinct cust_id to map each cust_id 
       to each generated date.*/
    WITH cust_dates AS
    (
      SELECT EX.cust_id, to_char(dts, 'YYYY-mm') dts
      FROM generate_series 
            (
             (SELECT MIN(datetime)::timestamp FROM example), 
             (SELECT MAX(datetime)::timestamp + '2 month'::interval  FROM example),
             '1 month'::interval
            ) dts
      CROSS JOIN (SELECT DISTINCT cust_id FROM example) EX
    ),
    /* do a left join with your table to find prices
       for each cust_id/ month, and aggregate for cust_id, month_date
       to find the sum of prices for each cust_id, month_date.
    */
    monthly_price AS
    (
      SELECT CD.cust_id,
           CD.dts AS month_date,
           COALESCE(SUM(price), 0) total_price
      FROM cust_dates CD LEFT JOIN example EX
      ON CD.cust_id = EX.cust_id AND 
         CD.dts = to_char(EX.datetime, 'YYYY-mm')
      GROUP BY CD.cust_id, CD.dts
    )
    /* Now, we have the sum of monthly prices for each cust_id,
       we can use the max window function with "ROWS BETWEEN 2 PRECEDING AND CURRENT ROW"
       to check if one of the (current month or the previous two months) has a sum of prices > 0.
    */
    SELECT cust_id, month_date, total_price,
      CASE MAX(total_price) OVER 
          (PARTITION BY cust_id ORDER BY month_date 
           ROWS BETWEEN 2 PRECEDING AND CURRENT ROW)
        WHEN 0 THEN 'inactive' 
        ELSE 'active'
      END AS is_active
    FROM monthly_price
    ORDER BY cust_id, month_date
    

    See demo