Search code examples
sqlsnowflake-cloud-data-platformquery-optimization

How to retrieve historical data based on condition on one row?


I have a table historical_data

ID Date column_a column_b
1 2011-10-01 a a1
1 2011-11-01 w w1
1 2011-09-01 a a1
2 2011-01-12 q q1
2 2011-02-01 d d1
3 2011-11-01 s s1

I need to retrieve the whole history of an id based on the date condition on any 1 row related to that ID.

date>='2011-11-01' should get me

ID Date column_a column_b
1 2011-10-01 a a1
1 2011-11-01 w w1
1 2011-09-01 a a1
3 2011-11-01 s s1

I am aware you can get this by using a CTE or a subquery like

with selected_id as (
select id from historical_data where date>='2011-11-01' 
)
select hd.* from historical_data hd 
inner join selected_id si on hd.id = si.id

or

select * from historical_data 
where id in (select id from historical_data where date>='2011-11-01')

In both these methods I have to query/scan the table ``historical_data``` twice. I have indexes on both id and date so it's not a problem right now, but as the table grows this may cause issues.

The table above is a sample table, the table I have is about to touch 1TB in size with upwards of 600M rows.

Is there any way to achieve this by only querying the table once? (I am using Snowflake)


Solution

  • Using QUALIFY:

    SELECT *
    FROM historical_data
    QUALIFY MAX(date) OVER(PARTITION BY id) >= '2011-11-01'::DATE;