Search code examples
sqlgoogle-bigquerydata-warehouse

Get the FIRST_VALUE from the current partition since stage has changed


I want to use the FIRST_VALUE() function in order to have the first "updated_at" value. But what I want is to have the first value since the stage has changed.

with values as (

    select 1 as deal, 2 as stage, '2020-11-10' as updated_at, '2020-11-10' as first_updated_at
    union all
    select 1 as deal, 2 as stage, '2020-11-11' as updated_at, '2020-11-10' as first_updated_at 
    union all
    select 1 as deal, 3 as stage, '2020-11-12' as updated_at, '2020-11-12' as first_updated_at
    union all 
    select 1 as deal, 4 as stage, '2020-11-13' as updated_at, '2020-11-13' as first_updated_at
    union all 
    select 1 as deal, 4 as stage, '2020-11-14' as updated_at, '2020-11-13' as first_updated_at
    union all 
    select 1 as deal, 2 as stage, '2020-11-15' as updated_at, '2020-11-15' as first_updated_at
    union all 
    select 1 as deal, 2 as stage, '2020-11-16' as updated_at, '2020-11-15' as first_updated_at

)
select * from values

I am trying to use the FIRST VALUE function like this:

FIRST_VALUE(updated_at) OVER(PARTITION BY deal, stageORDER BY updated_at ASC

Am I missing something here or what I want is not possible?

Thanks in advance


Solution

  • You can do this using window functions. First, look at the previous stage to see if it is different from the previous row. Then use a cumulative max to take the updated_at when this occurs:

    with values as (
    
        select 1 as deal, 2 as stage, '2020-11-10' as updated_at, '2020-11-10' as first_updated_at
        union all
        select 1 as deal, 2 as stage, '2020-11-11' as updated_at, '2020-11-10' as first_updated_at 
        union all
        select 1 as deal, 3 as stage, '2020-11-12' as updated_at, '2020-11-12' as first_updated_at
        union all 
        select 1 as deal, 4 as stage, '2020-11-13' as updated_at, '2020-11-13' as first_updated_at
        union all 
        select 1 as deal, 4 as stage, '2020-11-14' as updated_at, '2020-11-13' as first_updated_at
        union all 
        select 1 as deal, 2 as stage, '2020-11-15' as updated_at, '2020-11-15' as first_updated_at
        union all 
        select 1 as deal, 2 as stage, '2020-11-16' as updated_at, '2020-11-15' as first_updated_at
    
    )
    select v.*,
           max(case when stage <> prev_stage or prev_stage is null then updated_at end) over (partition by deal order by updated_at) as imputed_first_updated_at
    from (select v.*,
                 lag(stage) over (partition by deal order by updated_at) as prev_stage
          from values v
         ) v