Search code examples
sqldatetimegoogle-bigqueryrowpartitioning

partition over period of time for a given column SQL BigQuery


My query outputs data - row by row - for each record I grab a value from last_modified_date column that is the latest date in that column BUT is not later than the date column's value. I save new value in column custom_last_modified_date. Code works fine, but it only checks for the latest date available for one date - instead of every row in a table. Data looks like this:

id           date           last_modified_date
A           02/28/22          2017-02-28 22:44
A           03/05/22          2017-02-28 05:14
A           03/05/22          2017-02-28 07:49
A           03/22/22          2017-02-28 06:09
A           03/22/22          2022-03-01 06:49
B           03/25/22          2022-03-20 07:49
B           03/25/22          2022-04-01 09:24 

Code:

SELECT 
id, 
date,
MAX(
        IF(
            date(
                string(
                    TIMESTAMP(
                        DATETIME(
                            parse_datetime('%Y-%m-%d %H:%M', last_modified_date)
                        )
                    )
                )
            ) <= date,
            date(
                string(
                    TIMESTAMP(
                        DATETIME(
                            parse_datetime('%Y-%m-%d %H:%M', last_modified_date)
                        )
                    )
                )
            ),
            null
        )
    ) OVER (PARTITION BY date, id) as custom_last_modified_date
FROM `my_table`

Output is this:

id           date        custom_last_modified_date
A           02/28/22          02/28/17
A           03/05/22          02/28/17
A           03/05/22          02/28/17
A           03/22/22          03/01/22
A           03/22/22          03/01/22
B           03/25/22          03/20/22
B           03/25/22          03/20/22

Desired output is:

id           date        custom_last_modified_date
A           02/28/22          02/28/22
A           03/05/22          03/01/22
A           03/05/22          03/01/22
A           03/22/22          03/01/22
A           03/22/22          03/01/22
B           03/25/22          03/20/22
B           03/25/22          03/20/22

Solution

  • I tried your query and your just have a mistake on partion by, your query will work if you removed date in your partition by clause. Your query should look like this:

    NOTE: I just created a CTE to convert dates to make it readable.

    with sample_data as (
    select 'A' as id, '03/05/22' as date_field, '2017-02-28 22:44' as last_modified_date,
    union all select 'A' as id, '03/05/22' as date_field, '2017-02-28 05:14' as last_modified_date,
    union all select 'A' as id, '03/05/22' as date_field, '2017-02-28 07:49' as last_modified_date,
    union all select 'A' as id, '03/22/22' as date_field, '2017-02-28 06:09' as last_modified_date,
    union all select 'A' as id, '03/22/22' as date_field, '2022-03-01 06:49' as last_modified_date,
    union all select 'B' as id, '03/25/22' as date_field, '2022-03-20 07:49' as last_modified_date,
    union all select 'B' as id, '03/25/22' as date_field, '2022-04-01 09:24' as last_modified_date,
    ),
    
    conv_data as (
    
    select 
    id,
    format_datetime('%m/%d/%y',parse_date('%m/%d/%y',date_field)) as date_field,
    format_datetime('%m/%d/%y',parse_datetime('%Y-%m-%d %H:%M', last_modified_date)) as last_modified_date,
    from sample_data
    )
    
    select 
    id,
    date_field,
    last_modified_date,
    max(if(last_modified_date <= date_field, last_modified_date, null)) over (partition by id) as custom_last_modified_date,
    
    from conv_data
    
    

    Output:

    enter image description here