Search code examples
rrandomdplyrprobabilitysample

Sample Multiple Columns Without Repeats R dplyr


I am trying to solve a problem where I take a random sample based on probability with 5 observations per row where I observe a color out of a possible set of colors, exclude the observed color from the next observation and repeat. Colors can repeat in any given column, but not in the same row.

Here is how I have approached the problem:

library(tidyverse)

data <- tibble(obsId = 1:100)

colors <- tibble(color = c('red', 'blue', 'white', 'yellow', 'green', 'orange', 
                           'gray', 'brown', 'purple', 'black', 'pink', 'navy', 
                           'maroon'), 
                  prob = c(0.85, 0.85, 0.75, 0.75, 0.65, 0.5, 0.5, 0.5, 0.4, 
                           0.4, 0.25, 0.15, 0.15))

data <- data %>% 
      mutate(color1 = sample(x = colors$color, size = n(), 
                          prob  = colors$prob, replace = T),
             color2 = sample(x = colors$color, size = n(), 
                          prob  = colors$prob, replace = T),
             color3 = sample(x = colors$color, size = n(), 
                          prob  = colors$prob, replace = T),
             color4 = sample(x = colors$color, size = n(), 
                          prob  = colors$prob, replace = T),
             color5 = sample(x = colors$color, size = n(), 
                          prob  = colors$prob, replace = T)

The issue I have is that color 2 will be equal to color 1 (and so forth) in certain rows. Is there any easy way to resolve this?


Solution

  • How about replicating (nrow(data) times) the sampling of 5 items:

    setNames(
      cbind(
        data,
        t(replicate(nrow(data), sample(colors$color, 5, prob=colors$prob)))
      ), c("obsID", paste0("color",1:5))
    )
    

    And here is a data.table version:

    library(data.table)
    
    f <- function(c, p,n=5) setNames(as.list(sample(c,n,prob=p)), paste0("color",1:n))
    
    setDT(data)[, f(colors$color, colors$prob), obsId]
    

    Output:

        obsID color1 color2 color3 color4 color5
    1       1 yellow   gray  white    red  green
    2       2   blue  white    red  black orange
    3       3  white   blue    red purple yellow
    4       4 orange   blue maroon    red yellow
    5       5   blue yellow   gray  green  brown
    6       6  green  black   gray orange  white
    7       7 yellow  white    red   blue  green
    8       8 yellow  black  green   pink  white
    9       9 orange purple    red  white   blue
    10     10   pink yellow  white orange   blue
    11     11  white    red   blue orange purple
    12     12    red  green  brown yellow purple
    13     13    red   blue  brown  green  white
    14     14    red orange  green   blue   gray
    15     15  white   blue    red   pink yellow
    16     16    red   blue  white purple  green
    17     17   blue orange  brown  white  green
    18     18  black   gray  brown yellow purple
    19     19   pink   blue  green orange   gray
    20     20 purple   pink yellow  brown    red
    21     21  green  black  white   blue   pink
    22     22  black   blue   pink   gray maroon
    23     23   blue    red  brown orange yellow
    24     24   gray    red purple  brown orange
    25     25 purple  brown  green orange yellow
    26     26   blue  white orange  green    red
    27     27    red   blue orange  white   gray
    28     28  white  black yellow   navy    red
    29     29 orange   blue purple  brown  green
    30     30 orange   blue purple  green  white
    31     31   blue    red purple yellow  white
    32     32  white   navy orange purple  brown
    33     33 orange   blue purple   gray yellow
    34     34    red   pink   blue yellow  green
    35     35  brown   blue yellow   gray  white
    36     36   gray  green yellow purple    red
    37     37  green orange yellow   navy  white
    38     38  brown  green purple  black orange
    39     39   gray  black  green   pink  white
    40     40  white  green yellow   blue    red
    41     41 orange  black   gray maroon  white
    42     42   blue  white   pink  brown    red
    43     43 yellow purple   gray   navy  green
    44     44  white    red  green yellow   pink
    45     45    red  green   pink yellow  black
    46     46 orange  white    red   gray yellow
    47     47  brown purple   pink  black    red
    48     48   pink   blue  green  brown  white
    49     49   blue  green  brown orange yellow
    50     50    red  black  green orange   blue
    51     51   gray    red  green  white yellow
    52     52   pink  black yellow    red  brown
    53     53  white  green   navy  black   blue
    54     54    red orange yellow  white  green
    55     55  brown   gray purple yellow    red
    56     56  brown yellow  black purple orange
    57     57   pink  black   gray  white orange
    58     58   blue  brown  black purple  green
    59     59    red  green   gray  white   blue
    60     60 orange maroon yellow  green   gray
    61     61   pink  white   blue orange   gray
    62     62  brown    red  white   gray   blue
    63     63  black    red   blue yellow   navy
    64     64  green maroon    red  black   blue
    65     65  brown yellow    red purple  green
    66     66   gray  brown   blue  green yellow
    67     67   blue yellow maroon purple orange
    68     68    red   gray   blue  black  white
    69     69   gray yellow  white  brown orange
    70     70  brown    red  white yellow maroon
    71     71  black    red  white orange yellow
    72     72   navy   blue  green   gray  black
    73     73   gray orange  brown   blue    red
    74     74  black   pink    red yellow   blue
    75     75   gray  black   blue  green yellow
    76     76   blue yellow  green   pink    red
    77     77 yellow   blue    red  green orange
    78     78  brown  white   gray   navy orange
    79     79  brown   blue  white  green   navy
    80     80  white   gray yellow  brown  green
    81     81   gray  brown  white yellow    red
    82     82 orange yellow   blue  white  green
    83     83   gray   blue  brown  white  green
    84     84 orange   blue  green  brown   gray
    85     85   pink orange   blue  brown    red
    86     86    red orange  green yellow   blue
    87     87  black  brown   navy  white yellow
    88     88  white  black  green purple   blue
    89     89 yellow  green    red   gray maroon
    90     90 purple maroon  brown   gray yellow
    91     91  white   pink   blue    red  black
    92     92   blue  black    red   pink  white
    93     93    red  white  brown  black  green
    94     94    red   pink   gray   blue purple
    95     95 orange  green  white   gray    red
    96     96    red   gray  green yellow purple
    97     97  white  brown purple yellow   blue
    98     98 purple    red orange yellow  green
    99     99  black  white   gray yellow    red
    100   100 yellow  white   blue purple   gray
    

    If colors has a group column, and data has a group column, we can approach in a slightly different way:

    library(data.table)
    f <- function(g, n=5) {
      c = colors[group==g, color]
      p = colors[group==g, prob]
      setNames(as.list(sample(c,n,prob=p)), paste0("color",1:n))
    } 
    
    setDT(colors)
    setDT(data)[, f(.BY$group), .(obsId,group)]