I have a maths problem that I need help with. It is medical study but I will use orchards as an example.
Lets say a study conducted in an orchard looked at the proportion of bad apples. The study has shown that: Results study
A) Red apples (bad 10; good 90, therefore probability of apple being bad is 0.10)
B) Green apples (bad 20; good 80, probability of bad apple 0.20)
Thus
C) All apples - green and red (bad 30; good 170, probability of bad apple 0.15)
Now, based on that study I want to estimate how many bad apples I might expect in my orchard.
I have 1,000 apples in my orchard 600 red and 400 green. Can somebody tell me why the following two estimations don't match and also suggest which one is correct?
Option 1
1,000 apples x probability of bad apple 0.15 = 150 bad apples
Option 2
600 red apples x probability of bad red apple 0.10 = 60 bad red apples
400 red apples x probability of bad green apple 0.20 = 80 bad green apples
Which sums up 60 + 80 = 140 bad apples.
So why is there a difference and which estimation is correct?
The second is correct. It is correct because it applies the correct probabilities to each case and sums the results; E[X+Y] = E[X] + E[Y]. The first is wrong because it uses the unweighted average of the two probabilities and applies it to a population that does not contain an equal number of samples of eachxss type. If the farm had 500 apples of each kind, both procedures would have worked.
EDIT: Based on comments, a clarification:
The question asks how many bad apples you'd expect. The number of bad apples is a random variable Z. The number of bad apples is the sum of bad red and bad green apples. These, in turn, are random variables X and Y, and Z = X + Y. We want the expected value of Z, E[Z], and this is E[X + Y]. We know by properties of the expected value that this is E[X] + E[Y]. That is, we can figure the expected number of bad red apples separately from the expected number of bad green apples and then add them together to get the total number of bad apples we expect. This method is basically the same as using conditional probability except that we have skipped dividing by the total number of apples; had we done that, we'd have been using conditional probabilities to find the probability of getting a bad apple, which is correct but not requested.