Search code examples
rrandomduplicatesdata-management

generating two columns with randomly put 1,2 or 3 but different values for each row


I am looking to assign 3 readers to a list of entries with ~1500 rows. Each row needs to be surveyed twice but not from the same person. My idea was to create two new columns in the data set with randomly put 1,2or3 for the respective readers. But the numbers need to be different for each column.

Anyone got an easy fix for that in R?


Solution

  • Here is a base R function.

    readers <- function(r, n){
      ex <- expand.grid(Reader.1 = seq_len(r), Reader.2 = seq_len(r))
      ex <- ex[ex[, 1] != ex[, 2], ]
      ex <- ex[sample(nrow(ex), n, TRUE), ]
      row.names(ex) <- NULL
      ex
    }
    
    set.seed(2020)
    readers(3, n = 15)
    #   Reader.1 Reader.2
    #1         3        2
    #2         3        2
    #3         2        3
    #4         2        1
    #5         2        1
    #6         3        2
    #7         3        1
    #8         2        3
    #9         2        1
    #10        1        3
    #11        3        1
    #12        3        1
    #13        2        3
    #14        1        3
    #15        3        1
    

    Edit

    Here is another solution.

    readers2 <- function(r, n){
      df <- data.frame(Reader.1 = rep(seq_len(r), length.out = n))
      i1 <- seq(1, n, by = 3)
      i2 <- seq(2, n, by = 3)
      i3 <- seq(3, n, by = 3)
      df$Reader.2 <- NA_integer_
      df$Reader.2[i1] <- sample(2:3, length(i1), TRUE)
      df$Reader.2[i2] <- sample(c(1L,3L), length(i2), TRUE)
      df$Reader.2[i3] <- sample(1:2, length(i3), TRUE)
      df
    }
    
    set.seed(2020)
    df <- readers2(3, 1500)
    table(df$Reader.1)
    #
    #  1   2   3 
    #500 500 500 
    
    table(df$Reader.2)
    #
    #  1   2   3 
    #505 479 516 
    
    table(df)
    #        Reader.2
    #Reader.1   1   2   3
    #       1   0 239 261
    #       2 245   0 255
    #       3 260 240   0