Weighted random sampling for Monte Carlo simulation in R

I would like to run a Monte Carlo simulation. I have a data.frame where rows are unique IDs which have a probability of association with one of the columns. The data entered into the columns can be treated as the weights for that probability. I want to randomly sample each row in the data.frame based on the weights listed for each row. Each row should only return one value per run. The data.frame structure looks like this:

ID,    X2000,  X2001,  X2002,  X2003,  X2004
X11,   0,      0,      0.5,    0.5,    0
X33,   0.25,   0.25,   0.25,   0.25,   0
X55,   0,      0,      0,      0,      1
X77,   0.5,    0,      0,      0,      0.5

For weighting, "X11" should either return X2002 or X2003, "X33" should have an equal probability of returning X2000, X2001, X2002, or X2003, should be equal with no chance of returning X2004. The only possible return for "X55" should be X2004.

The output data I am interested in are the IDs and the column that was sampled for that run, although it would probably be simpler to return something like this:

ID,    X2000,  X2001,  X2002,  X2003,  X2004
X11,   0,      0,      1,      0,      0
X33,   1,      0,      0,      0,      0
X55,   0,      0,      0,      0,      1
X77,   1,      0,      0,      0,      0

Solution

Your data.frame is transposed - the sample() function takes a probability vector. However, your probability vector is rowwise which means it's harder to extract from a data.frame.

To get around this - you can import your ID column as a row.name. This allows you to be able to access it during an apply() statement. Note the apply() will coerce the data.frame to a matrix which means only one data type is allowed. That's why the IDs needed to be rownames - otherwise we'd have a probability vector of characters instead of numerics.

mc_df <- read.table(
text = 
'ID    X2000  X2001  X2002  X2003  X2004
X11   0      0      0.5    0.5    0
X33   0.25   0.25   0.25   0.25   0
X55   0      0      0      0      1
X77   0.5    0      0      0      0.5'
                    , header = T
                    ,row.names = 1)

From there, can use the apply function:

apply(mc_df, 1, function(x) sample(names(x), size = 200, replace = T, prob = x))

Or you could make it fancy

apply(mc_df, 1, function(x) table(sample(names(x), size = 200, replace = T, prob = x)))

$X11

X2002 X2003 
  102    98 

$X33

X2000 X2001 X2002 X2003 
   54    47    64    35 

$X55

X2004 
  200 

$X77

X2000 X2004 
  103    97

Fancier:

apply(mc_df, 1, function(x) table(sample(as.factor(names(x)), size = 200, replace = T, prob = x)))
      X11 X33 X55 X77
X2000   0  51   0  99
X2001   0  50   0   0
X2002  91  57   0   0
X2003 109  42   0   0
X2004   0   0 200 101

Or fanciest:

prop.table(apply(mc_df
                 , 1
                 , function(x) table(sample(as.factor(names(x)), size = 200, replace = T, prob = x)))
           ,2)
       X11   X33 X55   X77
X2000 0.00 0.270   0 0.515
X2001 0.00 0.235   0 0.000
X2002 0.51 0.320   0 0.000
X2003 0.49 0.175   0 0.000
X2004 0.00 0.000   1 0.485