Search code examples
sqlgoogle-bigquerygreatest-n-per-groupwindow-functionsranking

Nested partitioning and ranking in google big query


Below is how the data looks like- enter image description here

I want to sort this data on different levels to achieve the final output.

Level 1: Whenever there are duplicate values for name, I want to get the least ranking for each distinct (id, name,last_name, gender) tuple.

Level 1 Result: enter image description here

Level 2: In level 2, I want to get the least ranking for each gender category for a particular name.

Level 2 Result: enter image description here

Final output: For each name, if 'male' and 'female' rank is the same then return the whichever occurs first in the table. If it is different return the record with the least rank.

Final result expected- enter image description here


Solution

  • Below is for BigQuery Standard SQL

    #standardSQL
    SELECT AS VALUE ARRAY_AGG(t ORDER BY ranking, id LIMIT 1)[OFFSET(0)]
    FROM `project.dataset.table` t
    GROUP BY name