Search code examples
rtidyversesample

Assigning a random variable based on a probability table/data frame in R


I have a probability data frame like below, called ptable:

 unique_id color share 
         1   red  0.3  
         1  blue  0.7  
         2   red  0.4  
         3  blue  0.5  

I'd like to randomly assign a color variable based on the share variable in the probably table to another data frame join_table that looks like below.

unique_id count
         1    3  
         2    4 

I understand sample() but am stuck on how to assign the probability by the shared unique_id. My latest attempt was

join_table %>% 
group_by(unqiue_id) %>% 
mutate(color= sample(ptable$race[unique_id==ptable$unique_id], 
                     size=n(), 
                     prob=ptable$share[nique_id==ptable$unique_id], 
                     replace=TRUE))

Any help would be great.


Solution

  • There were two typos in the code:

    group_by(unqiue_id) should be group_by(unique_id) and

    prob=ptable$share[nique_id==ptable$unique_id] should be prob=ptable$share[unique_id==ptable$unique_id].

    This should work:

    library(dplyr)
    #> 
    #> Attaching package: 'dplyr'
    #> The following objects are masked from 'package:stats':
    #> 
    #>     filter, lag
    #> The following objects are masked from 'package:base':
    #> 
    #>     intersect, setdiff, setequal, union
    ptable <- tibble::tribble(
      ~unique_id, ~color, ~share,
    1,   "red",  0.3,  
    1,  "blue",  0.7, 
    2,   "red",  0.4,
    3,  "blue",  0.5)
    
    join_table <- tibble::tribble(
      ~unique_id, ~count,
      1,    3,  
      2,    4)
    
    join_table %>% 
      group_by(unique_id) %>% 
      mutate(color= sample(ptable$color[unique_id==ptable$unique_id], 
                           size=n(), 
                           prob=ptable$share[unique_id==ptable$unique_id], 
                           replace=TRUE))
    #> # A tibble: 2 × 3
    #> # Groups:   unique_id [2]
    #>   unique_id count color
    #>       <dbl> <dbl> <chr>
    #> 1         1     3 blue 
    #> 2         2     4 red
    

    Created on 2022-03-01 by the reprex package (v2.0.1)