Search code examples
google-bigqueryoverlappingdistinct-values

BiqQuery - How to find unique ids overlapping for multiple categories in a column?


I am very new to BigQuery and standard SQL. I might not be able to figure out the correct approach to solve a problem. Please help me out.

I have a Country Column and a ID column. Example shown below:

Country | ID
USA | id_1
USA | id_2
USA | id_1
UK | id_1
UK | id_1
UK | id_2
UK | id_3
AUS | id_3
AUS | id_4
AUS | id_2

I want a resulting column such as:

Output 1: UNIQUE OVERLAPPING ID values across all the countries

Country | Unique_overlapping_ids
USA | 2
UK | 3
AUS | 2

Output 2: UNIQUE non-OVERLAPPING ID values across all the countries

Country | Non_Unique_overlapping_ids
USA | 0
UK | 0
AUS | 1

I have 88 distinct Countries and over 5M unique IDs

Please help me. Thank you for your time and patience.


Solution

  • Below is for BigQuery Standard SQL

    #standardSQL
    select country, 
      count(distinct if(shared, id, null)) as Unique_overlapping_ids,
      count(distinct if(shared, null, id)) as Unique_non_overlapping_ids
    from `project.dataset.table` 
    join (
      select id, count(distinct country) > 1 shared
      from `project.dataset.table`
      group by id
    ) using(id)
    group by country   
    

    if to apply to sample data from your question - output is

    enter image description here