Search code examples
snowflake-cloud-data-platforminterpolation

Interpolation in Snowflake. Anyone know if this is possible via some kind of inbuilt function (or otherwise)?


The frequency my input data (timestamps) are non-uniform in distribution. I'm hoping to use interpolation to make the x axis uniform.

So far, I can't find anything in the Snowflake documentation.

Input: enter image description here


Solution

  • Here is one approach -

    Below date CTE has non-uniform dates.

    Idea is - We get min date and max date from the data-set. Next, we find number of days between max and min dates. Next, we calculate how many days to add starting from min date to genenerate interpolated data-set for dates. For table generator we have to give a constant, hence we use a large number and use qualify to stop at the number of dates needed.

    with date_cte(dates) as
    (select * from values
    ('2022-01-12'::date),
    ('2022-02-12'::date),
    ('2022-02-19'::date),
    ('2022-03-11'::date),
    ('2022-04-10'::date),
    ('2022-04-12'::date),
    ('2022-05-02'::date),
    ('2022-06-14'::date),
    ('2022-06-28'::date)
    ), agg_cte as (
    select dates, 
    max(dates) over (order by null) mx ,
    min(dates) over (order by null)  mn,
    count(*) over (order by null)  cnt,
    datediff(day,mn,mx) days_btwn, days_btwn/(cnt-1) days_incr 
    from date_cte)
    select a.dates,
    a.mn,
    a.mx,
    a.days_btwn,
    a.days_incr
    ,row_number() over(order by seq4()) rn,
    dateadd(day,a.days_incr * (rn-1), a.mn) interpolated_dates
    from agg_cte a,table(generator(rowcount=>10000)) 
    qualify row_number() over (order by null)<=cnt
    order by interpolated_dates asc;
    

    Will yield following data-set (refer column INTERPOLATED_DATES) -

    DATES MN MX DAYS_BTWN DAYS_INCR RN INTERPOLATED_DATES
    2022-01-12 2022-01-12 2022-06-28 167 20.875000 1 2022-01-12
    2022-02-12 2022-01-12 2022-06-28 167 20.875000 2 2022-02-02
    2022-02-19 2022-01-12 2022-06-28 167 20.875000 3 2022-02-23
    2022-03-11 2022-01-12 2022-06-28 167 20.875000 4 2022-03-16
    2022-04-10 2022-01-12 2022-06-28 167 20.875000 5 2022-04-06
    2022-04-12 2022-01-12 2022-06-28 167 20.875000 6 2022-04-26
    2022-05-02 2022-01-12 2022-06-28 167 20.875000 7 2022-05-17
    2022-06-14 2022-01-12 2022-06-28 167 20.875000 8 2022-06-07
    2022-06-28 2022-01-12 2022-06-28 167 20.875000 9 2022-06-28

    Updated (included cities) answer -

    with date_cte(dates,cities) as
    (select * from values
    ('2022-01-12'::date,'citi-1'),
    ('2022-02-12'::date,'citi-1'),
    ('2022-02-19'::date,'citi-1'),
    ('2022-03-11'::date,'citi-1'),
    ('2022-04-10'::date,'citi-1'),
    ('2022-04-12'::date,'citi-1'),
    ('2022-05-02'::date,'citi-1'),
    ('2022-06-14'::date,'citi-1'),
    ('2022-06-28'::date,'citi-1'),
    ('2022-01-11'::date,'citi-2'),
    ('2022-02-12'::date,'citi-2'),
    ('2022-02-19'::date,'citi-2'),
    ('2022-03-11'::date,'citi-2'),
    ('2022-04-10'::date,'citi-2'),
    ('2022-04-12'::date,'citi-2'),
    ('2022-05-02'::date,'citi-2'),
    ('2022-06-14'::date,'citi-2'),
    ('2022-07-11'::date,'citi-2'),
    ('2022-06-28'::date,'citi-2'),
    ('2022-01-07'::date,'citi-3'),
    ('2022-02-12'::date,'citi-3'),
    ('2022-02-19'::date,'citi-3'),
    ('2022-03-11'::date,'citi-3'),
    ('2022-04-10'::date,'citi-3'),
    ('2022-04-12'::date,'citi-3'),
    ('2022-05-02'::date,'citi-3'),
    ('2022-06-14'::date,'citi-3'),
    ('2022-06-28'::date,'citi-3'),
    ('2022-06-30'::date,'citi-3'),
    ('2022-07-21'::date,'citi-3')
    ), agg_cte as (
    select cities,dates, 
    row_number() over(partition by cities order by null) rn,
    max(dates) over (partition by cities order by null) mx ,
    min(dates) over (partition by cities order by null)  mn,
    count(*) over (partition by cities order by null)  cnt,
    datediff(day,mn,mx) days_btwn, days_btwn/(cnt-1) days_incr 
    from date_cte)
    select a.cities,
    a.dates,
    a.mn,
    a.mx,
    a.days_btwn,
    a.days_incr,
    dateadd(day,a.days_incr * (rn-1), a.mn) interpolated_dates
    from agg_cte a
    order by cities,interpolated_dates,dates asc;
    
    CITIES DATES MN MX DAYS_BTWN DAYS_INCR INTERPOLATED_DATES
    citi-1 2022-02-12 2022-01-12 2022-06-28 167 20.875000 2022-01-12
    citi-1 2022-02-19 2022-01-12 2022-06-28 167 20.875000 2022-02-02
    citi-1 2022-03-11 2022-01-12 2022-06-28 167 20.875000 2022-02-23
    citi-1 2022-04-10 2022-01-12 2022-06-28 167 20.875000 2022-03-16
    citi-1 2022-04-12 2022-01-12 2022-06-28 167 20.875000 2022-04-06
    citi-1 2022-05-02 2022-01-12 2022-06-28 167 20.875000 2022-04-26
    citi-1 2022-06-14 2022-01-12 2022-06-28 167 20.875000 2022-05-17
    citi-1 2022-06-28 2022-01-12 2022-06-28 167 20.875000 2022-06-07
    citi-1 2022-01-12 2022-01-12 2022-06-28 167 20.875000 2022-06-28
    citi-2 2022-05-02 2022-01-11 2022-07-11 181 20.111111 2022-01-11
    citi-2 2022-06-28 2022-01-11 2022-07-11 181 20.111111 2022-01-31
    citi-2 2022-07-11 2022-01-11 2022-07-11 181 20.111111 2022-02-20
    citi-2 2022-06-14 2022-01-11 2022-07-11 181 20.111111 2022-03-12
    citi-2 2022-04-12 2022-01-11 2022-07-11 181 20.111111 2022-04-01
    citi-2 2022-04-10 2022-01-11 2022-07-11 181 20.111111 2022-04-22
    citi-2 2022-03-11 2022-01-11 2022-07-11 181 20.111111 2022-05-12
    citi-2 2022-02-19 2022-01-11 2022-07-11 181 20.111111 2022-06-01
    citi-2 2022-02-12 2022-01-11 2022-07-11 181 20.111111 2022-06-21
    citi-2 2022-01-11 2022-01-11 2022-07-11 181 20.111111 2022-07-11
    citi-3 2022-01-07 2022-01-07 2022-07-21 195 19.500000 2022-01-07
    citi-3 2022-02-12 2022-01-07 2022-07-21 195 19.500000 2022-01-27
    citi-3 2022-02-19 2022-01-07 2022-07-21 195 19.500000 2022-02-15
    citi-3 2022-03-11 2022-01-07 2022-07-21 195 19.500000 2022-03-07
    citi-3 2022-04-10 2022-01-07 2022-07-21 195 19.500000 2022-03-26
    citi-3 2022-04-12 2022-01-07 2022-07-21 195 19.500000 2022-04-15
    citi-3 2022-05-02 2022-01-07 2022-07-21 195 19.500000 2022-05-04
    citi-3 2022-06-14 2022-01-07 2022-07-21 195 19.500000 2022-05-24
    citi-3 2022-06-28 2022-01-07 2022-07-21 195 19.500000 2022-06-12
    citi-3 2022-06-30 2022-01-07 2022-07-21 195 19.500000 2022-07-02
    citi-3 2022-07-21 2022-01-07 2022-07-21 195 19.500000 2022-07-21