Search code examples
rsamplingweighted

Weighted Sampling with multiple probability vectors in R


I have a similar question like this:

Weighted sampling with 2 vectors

I now have a dataset which contains 1000 observations and 4 columns for each observation. I want to sample 200 observations from the original dataset with replacement.

But the PROBLEM is: I need to assign different probability vector for each column. For example, for the first column. I want equal probability c(0.001,0.001,0.001,0.001...). For the second column, I want something different like c(0.0005,0.0002,......). Of course, each probability vector sum up to 1.

I know sample can do with one vector. But I am not sure about other commands. Please HELP me!

Thank you in advance! Colamonkey


Solution

  • data frame with sample probabilities

    # in your case the rows are 1000 and the columns 4, 
    # but it is just to show the procedure
    samp_prob <- data.frame(A = rep(.25, 4), B = c(.5, .1, .2, .2), C = c(.3, .6, .05, .05))
    

    data frame of values to sample from with replacement

    df <- data.frame(a = 1:4, b = 2:5, c = 3:6)
    

    sampling

    sam <- mapply(function(x, y) sample(x, 200, T, y), df, samp_prob)
    head(sam)
         a b c
    [1,] 4 5 6
    [2,] 1 2 4
    [3,] 1 2 4
    [4,] 4 4 4
    [5,] 4 4 4
    [6,] 1 2 4
    
    # you can also write (it is equivalent):
    mapply(df, samp_prob, FUN = sample, size = 200, replace = T)