I have a table that consists of 11 columns which I want to calculate the conditional probability on its two columns such as race and income.
Income is a varchar(3) field and has values either "<50K" or ">50K".
Race consists of three values such as "White", "Black" and "Asian".
I want to compute
P[Income = ">50K" | Race = X] for each race given above.
However, I cannot come up with an SQL statement to get this information. I thought to group by race and count where income equals to ">50K", but this only gives count, and I do not know how to divide it by total number of entries in the same query. Can you help me?
You can use aggregation:
select race,
avg(case when income = '>50K' then 1.0 else 0 end)
from t
group by race;