Search code examples
rsample

R: How to sample a different column for each row of a dataframe?


I want to sample a different column for each row of a dataframe using differing weights. I have tried a few things but have not been successful, including looking up similar questions. I am presenting a mock DF and expected output below.

library(plyr)
set.seed(12345)
df1 <- mdply(data.frame(mean=c(10, 15, 12, 24)), rnorm, n = 5, sd = 1)
df1

I want a vectorized solution (hopefully) to sample one column from V1 to V5 for every row. The weights for the sampling are the values in each cell from V1 to V5 for the row in question. The actual dataframe could have a couple million rows. A sample output is shown below.

f_col <- c(10,15,12,24)
sampled_column <- c("V3", "V1", "V5", "V5")

output_df1 <- data.frame("mean" = f_col, "result" = sampled_column)
output_df1

Solution

  • In sample you can use prob to weight your sample probability. To make this for every row you can use apply.

    output_df1 <- data.frame("mean"=df1$mean, "result"=apply(df1[,-1], 1, function(x) {sample(names(x), 1, prob=x)}))