Search code examples
rpermutationsampling

Random sampling in R within Categorical variable


Suppose I have a data frame with categorical variable of n classes and a numerical variable. I need to randomize the numerical variable within each category. For example , consider the following table:

Col_1           Col_2      
   A               2        
   A               5           
   A               4           
   A               8        
   B               1   
   B               4        
   B               9          
   B               7       

When I tried sample() function in R, it threw the result considering both the categories. Is there any function where I can get this kind of output? (with or without replacement, doesn't matter)

Col_1           Col_2      
 A               8        
 A               4           
 A               2           
 A               5        
 B               9  
 B               7       
 B               4          
 B               1

Solution

  • You could sample row numbers within groups. In base R, we can use ave

    df[with(df, ave(seq_len(nrow(df)), Col_1, FUN = sample)), ]
    
    #  Col_1 Col_2
    #2     A     5
    #4     A     8
    #1     A     2
    #3     A     4
    #7     B     9
    #5     B     1
    #8     B     7
    #6     B     4
    

    In dplyr, we can use sample_n

    library(dplyr)
    df %>% group_by(Col_1) %>% sample_n(n())
    

    data

    df <- structure(list(Col_1 = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 
    2L), .Label = c("A", "B"), class = "factor"), Col_2 = c(2L, 5L, 
    4L, 8L, 1L, 4L, 9L, 7L)), class = "data.frame", row.names = c(NA, -8L))