I am very new to BigQuery and standard SQL. I might not be able to figure out the correct approach to solve a problem. Please help me out.
I have a Country Column and a ID column. Example shown below:
Country | ID
USA | id_1
USA | id_2
USA | id_1
UK | id_1
UK | id_1
UK | id_2
UK | id_3
AUS | id_3
AUS | id_4
AUS | id_2
I want a resulting column such as:
Output 1: UNIQUE OVERLAPPING ID values across all the countries
Country | Unique_overlapping_ids
USA | 2
UK | 3
AUS | 2
Output 2: UNIQUE non-OVERLAPPING ID values across all the countries
Country | Non_Unique_overlapping_ids
USA | 0
UK | 0
AUS | 1
I have 88 distinct Countries and over 5M unique IDs
Please help me. Thank you for your time and patience.
Below is for BigQuery Standard SQL
#standardSQL
select country,
count(distinct if(shared, id, null)) as Unique_overlapping_ids,
count(distinct if(shared, null, id)) as Unique_non_overlapping_ids
from `project.dataset.table`
join (
select id, count(distinct country) > 1 shared
from `project.dataset.table`
group by id
) using(id)
group by country
if to apply to sample data from your question - output is