Search code examples
rcombinationscombinatoricsprobability-theory

R - Find the chance of each subset in a given universe


Let me start with an easy example of my problem.

Assume a universe of 10 people, where 1 person owns product A and 2 persons own product B
U=10, A=1, B=2

Now I want to find the chances that:
1) a person owns no products ==> (1 - 1/10) * (1 - 2/10) = 0.72
2) a person owns at least 1 product ==> 1 - ((1 - 1/10) * (1 - 2/10)) = 0.28
3) a person owns 2 products ==> (1/10) * (2/10) = 0.02

However, I would like to have a general algorithm that sorts all these options out if there are n products.

Input is given as follows
U <- 10
products <- c('A','B')
owned_by <- c(1, 2)
df <- data.frame(products, owned_by)


Solution

  • I think a closed form solution would involve too many terms and become overly complex. So this problem looks like a perfect candidate for Monte Carlo method.

    set.seed(1984)
    
    U <- 10
    products <- c('A','B')
    owned_by <- c(1,2) 
    df <- data.frame(products, owned_by)
    p = rep(0, nrow(df)+1)
    num.runs = 1000
    
    for(n in 1:num.runs)
    {
      x=c()  ## list of people who own a product
      for (i in 1:nrow(df))
        x = c(x, sample(1:U, df$owned_by[i]))
    
      ## get the number of people who own 0, 1, 2...products
      p[1] = p[1] + (sum(hist(x,breaks=0:U,plot=F)$counts == 0) / U)
      for(i in 1:nrow(df))
        p[i+1] = p[i+1] + (sum(hist(x,breaks=0:U,plot=F)$counts >= i) / U)
    }
    
    p = p / num.runs ## average over all runs
    p
    
    ## 0.7197 0.2803 0.0197