Search code examples
sqlsnowflake-cloud-data-platformdate-range

How to create one row per id per month?


I want to create one row per id per month till the month of end_date.

e.g. first customer id started in Oct and ended in Nov. So I want to get two rows for each month that the customer was active. Besides that, I want to create a column that flags if it was active in that month.

| id | start_date |  end_date  |
|----|------------|------------|
| a  | 2021-10-02 | 2021-11-15 |
| b  | 2021-11-13 | 2021-11-30 |
| c  | 2021-11-16 |            |

When there is no end_date, meaning it is still active, it has to be till the current month.

Example data:

WITH t1 AS (
SELECT 'a' AS id, '2021-10-02'::date AS start_date, '2021-11-15'::date AS end_date UNION ALL
SELECT 'b' AS id, '2021-11-13'::date AS start_date, '2021-11-30'::date AS end_date UNION ALL
SELECT 'c' AS id, '2021-11-16'::date AS start_date, NULL::date AS end_date
    )

The expected result:

| id | start_date |  end_date  |   months   | is_active |
|----|------------|------------|------------|-----------|
| a  | 2021-10-02 | 2021-11-15 | 2021-10-01 | TRUE      |
| a  | 2021-10-02 | 2021-11-15 | 2021-11-01 | FALSE     |
| b  | 2021-11-13 | 2021-11-30 | 2021-11-01 | FALSE     |
| c  | 2021-11-16 |            | 2021-11-01 | TRUE      |
| c  | 2021-11-16 |            | 2021-12-01 | TRUE      |
| c  | 2021-11-16 |            | 2022-01-01 | TRUE      |

How can I achieve that in Snowflake?


Solution

  • so given you have a range, you will need something that spans time to join against, this is where a generator can be used, which I will put into a CTE. I will also use ROW_NUMBER() to generate the sequence of month steps, to make sure there is no gaps. the 200 needs to be hard coded, so put a value that spans enough data for you needs, or pop this into a table.

    WITH months AS (
        SELECT 
            ROW_NUMBER() OVER (ORDER BY NULL) - 1 as rn
        FROM TABLE(generator(rowcount => 200))
    )
    

    next we want to truncate the start_date and find the number of months later that end_date is, and join that to our range

    ), range_prep AS (
        SELECT id,
            start_date,
            end_date,
            date_trunc(month, start_date) as start_month,
            datediff(month, start_month, coalesce(end_date, CURRENT_DATE())) as month_count
        FROM data
    )
    

    joining those together, and doing :

    SELECT id,
        r.start_date,
        r.end_date,
        dateadd(month, m.rn, r.start_month) as months,
        (r.end_date is null) OR (date_trunc(month, r.end_date) > months) AS is_active
    FROM range_prep as r
    JOIN months as m
        ON m.rn <= r.month_count
    ORDER BY 1,2;
    

    putting all together with a CTE for data we have:

    WITH data AS (
        SELECT id, 
            to_date(start_date) as start_date, 
            to_date(end_date) as end_date  
        FROM VALUES 
            ('a','2021-10-02','2021-11-15'),
            ('b','2021-11-13','2021-11-30'),
            ('c','2021-11-16',null)
        v( id, start_date, end_date)
    ), months AS (
        SELECT 
            ROW_NUMBER() OVER (ORDER BY NULL) - 1 as rn
        FROM TABLE(generator(rowcount => 200))
    ), range_prep AS (
        SELECT id,
            start_date,
            end_date,
            date_trunc(month, start_date) as start_month,
            datediff(month, start_month, coalesce(end_date, CURRENT_DATE())) as month_count
        FROM data
    )
    SELECT id,
        r.start_date,
        r.end_date,
        dateadd(month, m.rn, r.start_month) as months,
        (r.end_date is null) OR (date_trunc(month, r.end_date) > months) AS is_active
    FROM range_prep as r
    JOIN months as m
        ON m.rn <= r.month_count
    ORDER BY 1,2;
    

    gives:

    ID START_DATE END_DATE MONTHS IS_ACTIVE
    a 2021-10-02 2021-11-15 2021-10-01 TRUE
    a 2021-10-02 2021-11-15 2021-11-01 FALSE
    b 2021-11-13 2021-11-30 2021-11-01 FALSE
    c 2021-11-16 2021-11-01 TRUE
    c 2021-11-16 2021-12-01 TRUE
    c 2021-11-16 2022-01-01 TRUE