I have the following table (see above).
What is the most efficient way to delete duplicates by appId AND clientId so the resulting table contains only those rows with minimum DIFF
Ex: 260848 should be yandex as source
Thanks!
Below is for BigQuery Standard SQL
#standardSQL
SELECT AS VALUE ARRAY_AGG(t ORDER BY DIFF LIMIT 1)[OFFSET(0)]
FROM `project.dataset.table` t
GROUP BY appId, clientId