I am new to SQL and writing queries on Google Big Query. I have a question regards whether I can select counts of id to meet one or more conditions, two or more conditions, etc.
For a simple case, say I have a group of people with different disease conditions. If I want to select how many of them have one or more conditions I can go below (where diease_X is a flag and 0 means not having the disease for this person 1 represents have).
select count(distinct ids)
from disease_table
where disease_A=1 or disease_B=1 or disease_C=1
If I want to find the number of people have all diseases I can change or
to and
in my where
conditions. But how do I count the number of people have 2 or more conditions? I think it is possible to solve it by listing all possibilities of disease, or using the total count to subtract number of people with only 1 or 0 disease.
However, I would like to know if there is a simple way for doing this since I have 2 follow up questions:
Any thought on this would be super helpful. I am currently writing a query in Google Big Query, so hopefully there is a magic clause I can use to make it happen.
UNPIVOT
might help solve your problem:
UNPIVOT
first.CREATE TEMP TABLE disease_table AS
SELECT 'a' AS ids, 1 AS disease_A, 1 AS disease_B, 1 AS disease_C
UNION ALL
SELECT 'b' AS ids, 1 AS disease_A, 0 AS disease_B, 1 AS disease_C
UNION ALL
SELECT 'c' AS ids, 1 AS disease_A, 1 AS disease_B, 0 AS disease_C
;
SELECT *
FROM disease_table UNPIVOT (has_disease FOR disease_name IN (disease_A, disease_B, disease_C))
;
Original table will turn into one like below.
SELECT ids, SUM(has_disease) number_of_disease
FROM disease_table UNPIVOT (has_disease FOR disease_name IN (disease_A, disease_B, disease_C))
GROUP BY ids
;
output:
n
diseases, HAVING
clause would be helpful.SELECT ids, SUM(has_disease) number_of_disease
FROM disease_table UNPIVOT (has_disease FOR disease_name IN (disease_A, disease_B, disease_C))
GROUP BY ids
HAVING number_of_disease > 2