Search code examples
sqlgroup-byprobability

Calculating Conditional Probabilities with SQL


I have a table that consists of 11 columns which I want to calculate the conditional probability on its two columns such as race and income.

Income is a  varchar(3) field and has values either "<50K" or ">50K". 
Race consists of three values such as "White", "Black" and "Asian". 

I want to compute

P[Income = ">50K" | Race = X] for each race given above.

However, I cannot come up with an SQL statement to get this information. I thought to group by race and count where income equals to ">50K", but this only gives count, and I do not know how to divide it by total number of entries in the same query. Can you help me?


Solution

  • You can use aggregation:

    select race,
           avg(case when income = '>50K' then 1.0 else 0 end)
    from t
    group by race;