I have the following SQL that attempts to grab the latest version of every row in a results table, i.e a singular row that has a unique confirmation_number, the max seq and max batch, this is using append-only replication (docs). However it is faulty returning no results, the inner join works as expected but not the outter. Expected result
confirmation_number | date | seq | batch
23742902 | date | max(seq)| max(batch)
SELECT DISTINCT r.*
FROM `results_table` r
INNER JOIN (
SELECT confirmation_number,
date,
MAX(_sdc_sequence) AS seq,
MAX(_sdc_batched_at) AS batch
FROM `results_table`
GROUP BY confirmation_number, date) rr
ON r.confirmation_number = rr.confirmation_number
AND r.date = rr.date
AND r._sdc_sequence = rr.seq
AND r._sdc_batched_at = rr.batch
This is the table I want to query against, and my sql is returning no result set. It should return me a singular row that contains a distinct confirmation_number, max seq and max batch
Below is for BigQuery Standard SQL
I am quite guessing here about your expected result - but below looks to me should give you expected
#standardSQL
SELECT AS VALUE ARRAY_AGG(r ORDER BY seq DESC LIMIT 1)[OFFSET(0)]
FROM `project.dataset.results_table` r
GROUP BY confirmation_number
As you can see here - it returns one row per confirmation_number - the row which has max seq value for that confirmation_number with all respective values (date, batch, etc.)
Not sure if you need to group also by date
(as it is in the query in your question - but in my mind it is not needed) - but if you do need - it is easy to add