Search code examples
mysqlgoogle-bigquerywhere-clause

SQL/Big Query: identify counts to meet two or more conditions


I am new to SQL and writing queries on Google Big Query. I have a question regards whether I can select counts of id to meet one or more conditions, two or more conditions, etc.

For a simple case, say I have a group of people with different disease conditions. If I want to select how many of them have one or more conditions I can go below (where diease_X is a flag and 0 means not having the disease for this person 1 represents have).

select count(distinct ids)
from disease_table
where disease_A=1 or disease_B=1 or disease_C=1

If I want to find the number of people have all diseases I can change or to and in my where conditions. But how do I count the number of people have 2 or more conditions? I think it is possible to solve it by listing all possibilities of disease, or using the total count to subtract number of people with only 1 or 0 disease.

However, I would like to know if there is a simple way for doing this since I have 2 follow up questions:

    1. My data table has 20+ diseases to work on, so it will make a long list in the where clause if I need to list them all.
    1. I am also curious to find out how to know the counts for three or more conditions, four or more conditions...

Any thought on this would be super helpful. I am currently writing a query in Google Big Query, so hopefully there is a magic clause I can use to make it happen.


Solution

  • UNPIVOT might help solve your problem:

    1. Reshape your table using UNPIVOT first.
    CREATE TEMP TABLE disease_table AS
    SELECT 'a' AS ids, 1 AS disease_A, 1 AS disease_B, 1 AS disease_C
     UNION ALL
    SELECT 'b' AS ids, 1 AS disease_A, 0 AS disease_B, 1 AS disease_C
     UNION ALL
    SELECT 'c' AS ids, 1 AS disease_A, 1 AS disease_B, 0 AS disease_C
    ;
    
    SELECT * 
      FROM disease_table UNPIVOT (has_disease FOR disease_name IN (disease_A, disease_B, disease_C))
    ;
    

    Original table will turn into one like below.

    enter image description here

    1. Count the number of diseases of each people.
    SELECT ids, SUM(has_disease) number_of_disease 
      FROM disease_table UNPIVOT (has_disease FOR disease_name IN (disease_A, disease_B, disease_C))
     GROUP BY ids 
    ;
    

    output:

    enter image description here

    1. For finding out the people having over n diseases, HAVING clause would be helpful.
    SELECT ids, SUM(has_disease) number_of_disease 
      FROM disease_table UNPIVOT (has_disease FOR disease_name IN (disease_A, disease_B, disease_C))
     GROUP BY ids 
    HAVING number_of_disease > 2
    

    enter image description here