I have a table that has the following columns:
Timestamp | Product ID | Column A | Column B | etc... |
---|---|---|---|---|
2024-12-03 | id_123 | AAA | BBB | CCC |
2024-12-03 | id_234 | AAA | BBB | CCC |
2024-12-02 | id_123 | AAA | BBB | CCC |
2024-12-02 | id_234 | AAA | BBB | CCC |
I want to build it so that it only returns the most recent occurrences for each product ID. So for the example above, it would only return:
Timestamp | Product ID | Column A | Column B | etc... |
---|---|---|---|---|
2024-12-03 | id_123 | AAA | BBB | CCC |
2024-12-03 | id_234 | AAA | BBB | CCC |
I've tried using MAX(Timestamp)
and DISTINCT
, but can't seem to get those to work.
Does anyone have any suggestions?
Thank you!
I tried using MAX(Timestamp)
but it didn't seem to actually limit it to the most recent date.
DISTINCT
seems to only work by comparing the values of the entire row to another row.
What you're looking for here is a window/analytic function.
SELECT
timestamp,
product_id,
column_a,
column_b,
FROM
your_dataset.your_table
QUALIFY
ROW_NUMBER() OVER(latest_product_record) = 1
WINDOW
latest_product_record AS (
PARTITION BY product_id
ORDER BY timestamp DESC
)
OVER
contains the conditions for defining the window over which you will calculate some quantityROW_NUMBER
function to return a number for each row based on its position within its associated windowQUALIFY
is an equivalent of WHERE
that takes a window clause as its condition.WINDOW
allows you to declare the definition of a window (which goes in the OVER
clause; a convenient way to keep the query tidy)See the official documentation for more detail.
ROW_NUMBER
returns 1
are included (i.e. only rows with the last timestamp for that product ID)