Let me start with an easy example of my problem.
Assume a universe of 10 people, where 1 person owns product A and 2 persons own product B
U=10, A=1, B=2
Now I want to find the chances that:
1) a person owns no products ==> (1 - 1/10) * (1 - 2/10) = 0.72
2) a person owns at least 1 product ==> 1 - ((1 - 1/10) * (1 - 2/10)) = 0.28
3) a person owns 2 products ==> (1/10) * (2/10) = 0.02
However, I would like to have a general algorithm that sorts all these options out if there are n products.
Input is given as follows
U <- 10
products <- c('A','B')
owned_by <- c(1, 2)
df <- data.frame(products, owned_by)
I think a closed form solution would involve too many terms and become overly complex. So this problem looks like a perfect candidate for Monte Carlo method.
set.seed(1984)
U <- 10
products <- c('A','B')
owned_by <- c(1,2)
df <- data.frame(products, owned_by)
p = rep(0, nrow(df)+1)
num.runs = 1000
for(n in 1:num.runs)
{
x=c() ## list of people who own a product
for (i in 1:nrow(df))
x = c(x, sample(1:U, df$owned_by[i]))
## get the number of people who own 0, 1, 2...products
p[1] = p[1] + (sum(hist(x,breaks=0:U,plot=F)$counts == 0) / U)
for(i in 1:nrow(df))
p[i+1] = p[i+1] + (sum(hist(x,breaks=0:U,plot=F)$counts >= i) / U)
}
p = p / num.runs ## average over all runs
p
## 0.7197 0.2803 0.0197