Search code examples
sqlgoogle-bigquerylag

SELECT previous group value based on conditions in Google Big Query


The goal is to populate the value of the previous group for all applicable rows and include conditions. Consitions are specified in the STATUS column. The query needs to be adjusted for condition STATUS = A.

Data Looks Like this:

DATE        ID   VALUE       GROUP_ID   STATUS
2021-06-01  1   New York       1          A
2021-06-02  1   New York       1          A
2021-06-03  1   New York       1          B
2021-06-04  1   New York       1          A
2021-06-05  1   Boston         2          A
2021-06-06  1   Boston         2          A
2021-06-07  1   San Francisco  3          A
2021-06-08  1   San Francisco  3          A
2021-06-09  1   New York       4          A 

Intended Result: Data Looks Like this:

DATE        ID   VALUE       GROUP_ID   STATUS  PREVIOUS_VALUE
2021-06-01  1   New York       1          A           NA
2021-06-02  1   New York       1          A           NA
2021-06-03  1   New York       1          B           NA   
2021-06-04  1   New York       1          A           NA
2021-06-05  1   Boston         2          A         New York
2021-06-06  1   Boston         2          A         New York
2021-06-07  1   San Francisco  3          A         Boston
2021-06-08  1   San Francisco  3          A         Boston 
2021-06-09  1   New York       4          A         San Francisco

Attempts till now

select *, last_vale(VALUE IGNORE NULLS) OVER (partition by ID, GROUP_ID order by DATE) from ( 
table)A
select *, lag(VALUE) OVER (partition by ID, GROUP_ID order by DATE) from ( 
table)A

I have a backup plan to create a table which will hold unique values based on conditions and then can run UPDATE statements based on the nearest smaller GROUP_ID but would rather have a more sustainable solution in place.

Thanks.

First Modification: Exclude Destination Titled 'Boston' I tried to use case when value not in ('Boston') then ...

DATE        ID   VALUE       GROUP_ID   STATUS  PREVIOUS_VALUE
2021-06-01  1   New York       1          A           NA
2021-06-02  1   New York       1          A           NA
2021-06-03  1   New York       1          B           NA   
2021-06-04  1   New York       1          A           NA
2021-06-05  1   Boston         2          A         New York
2021-06-06  1   Boston         2          A         New York
2021-06-07  1   San Francisco  3          A         San Francisco
2021-06-08  1   San Francisco  3          A         San Francisco
2021-06-09  1   New York       4          A         San Francisco

Solution

  • How about using a window frame specification?

    select t.*,
           max(value) over (order by group_id
                            range between 1 preceding and 1 preceding
                           ) as prev_value
    from t;
    

    If the group_ids are sequential but have gaps, then you can use dense_rank() to get one that works:

    select t.*,
           max(value) over (order by dense_group_id
                            range between 1 preceding and 1 preceding
                           ) as prev_value
    from (select t.*,
                 dense_rank() over (order by group_id) as dense_group_id
          from t
         ) t