Search code examples
rrandomsample

R: Random values from one column in 5 columns


I have a dataframe (df) containing approximately 100 soccer player numbers (if more players sign-up, the number increases). Each player_number consists of 6 digits (e.g. 178530).

Every player should review 5 other players, so eventually all players are reviewed by 5 others. Therefore I would like to randomly assign 5 different player numbers (from the player_number column) to each player_number. To prevent assigning reviews to themselves and/or players having to review the same player twice (or more), each player_number should only occur once in every column and in every row. The dataframe should look like this:

player_number  review1  review2  review3  review4  review5
178530         207145    655600   443274   604060   804226
245678         947821    214525   332324   174589   868954      
…

Player 178530 needs to review players 207145, 655600 etc.

For review1 column, I have used: set.seed(1) df$review1 <- sample(df$player_number, nrow(df), replace=F)

This works for review1, but applying it to the other review columns leads to duplicate player_number in several rows. Can anyone help me out so each player_number only occurs once in every column and in every row? Thanks in advance.

Edit: in a previous version I simplified the player_number too much (1:100)


Solution

  • You could write a function for that. The idea is to take your vector of 100 IDs or player numbers; randomly sample 5 unique starting values for 5 new vectors and bind these to have your result where no IDs are found more than once in every row and column.

    For example, if you have numbers 1 to 5 (that order), and want to assign 3 of the numbers to each number of 1 to 5; having no number more than once in a row or column.

    1 3 2 5
    2 4 3 1
    3 5 4 2
    4 1 5 3
    5 2 1 4
    

    This is the function that does that.

    play <- function(v, i){
      starts <- sample(2:length(v), i, replace=F)
      v2 <- v
      for(m in 1:i){
        v2 <- cbind(v2, c(v[starts[m]:length(v)], v[0:(starts[m]-1)]) )
      }
      colnames(v2) <- c('id', paste0('R', 1:i))
      return(v2)
    }
    

    Try it.

    play(1:5, 3)
    

    This is a similar function that takes a dataframe because you are asking for that in the question.

    playDF <- function(df, i){
      starts <- sample(1:nrow(df), i+1, replace=F)
      sq2 <- NULL
      for(m in 1:(i+1)){
        sq2 <- cbind(sq2, c(df[starts[m]:nrow(df),], df[0:(starts[m]-1),]) )
      }
      sq2 <- as.data.frame(sq2)
      colnames(sq2) <- c('player_number', paste0('review', 1:(i)))
      return(sq2)
    }
    

    I've added example data for your problem. Run the function and apply it to the data.

    df <- data.frame(player_number=c(sample(111111:999999, 100, replace=F)))
    playDF(df, 5)