I am trying to calculate the conditional probabilities for P(A=a|B=b,C=c)
where a
is an element in ['high', 'medium', 'low']
, b is an element in ['0-20', '20-40', '40-60', '60-80', '80-inf']
and c
is an element in ['male', 'female']
.
I have a dictionary with the frequencies that looks like this:
{('high', '0-20', 'female'): 11,
('high', '0-20', 'male'): 43,
('high', '20-40', 'female'): 10,
('high', '20-40', 'male'): 17,
('high', '40-60', 'female'): 11,
('high', '40-60', 'male'): 10,
('high', '60-80', 'female'): 2,
('high', '60-80', 'male'): 1,
('high', '80-inf', 'female'): 0,
('high', '80-inf', 'male'): 0,
('low', '0-20', 'female'): 130,
('low', '0-20', 'male'): 159,
('low', '20-40', 'female'): 186,
('low', '20-40', 'male'): 297,
('low', '40-60', 'female'): 71,
('low', '40-60', 'male'): 144,
('low', '60-80', 'female'): 35,
('low', '60-80', 'male'): 53,
('low', '80-inf', 'female'): 1,
('low', '80-inf', 'male'): 2,
('medium', '0-20', 'female'): 90,
('medium', '0-20', 'male'): 194,
('medium', '20-40', 'female'): 72,
('medium', '20-40', 'male'): 116,
('medium', '40-60', 'female'): 46,
('medium', '40-60', 'male'): 49,
('medium', '60-80', 'female'): 12,
('medium', '60-80', 'male'): 22,
('medium', '80-inf', 'female'): 1,
('medium', '80-inf', 'male'): 2}
What I want is a dictionary that looks like:
{('high', '0-20', 'female'): P(A='high'| B='0-20', C='female'),
etc...,
}
So, if I'm understanding your comment correctly, what you are having trouble with is the concept of calculating the conditional probability when there are two or more "conditions" as opposed to a single condition.
It's been quite a while since I last took a probability/statistics class, but I think what you need to do is break this down into separate problems. From the data, you can easily calculate your P(B=b)
and P(C=c)
. What you need next is the joint probability that B=b
AND C=c
, which you should also be able to get directly from the data - e.g. P(high, 0-20)
is just the sum of all the points that match both conditions divided by the total count. If you call this joint probability P(X)
, then, it should be fairly straightforward from the definition of conditional probability to calculate P(A=a|X) = P(A=a ∩ X) / P(X).
It might be a good idea to repost this or migrate it to the Math SE site, though, to get confirmation and/or a better answer...