Search code examples
rplyr

Apply hierachial shuffling/sorting of data frame in R


I have data frame where I want to shuffle or more generally sort the values hierachially but I am stuck. Here is my example:

library(plyr)
set.seed(123)
df <- data.frame(a = rep(letters[1:4], each = 10), b = rnorm(40))
> head(df)
  a          b
1 a -0.1264280
2 a  0.7284234
3 a -1.8782385
4 a  0.2530623
5 a  0.7577013
6 a -0.9339964

In this example, I want to suffle (sample) the values but only within the letters so that a value assigned to the letter a in colum a cannot be assigned to any other letter but only to a different a row but not b, c or d.

I've tried this ddply(df, c('a'), b = sample(b)) but this didn't work.


Solution

  • Using dplyr, group_by(a) then use mutate(b = sample(b))

    library(dplyr)
    head(df, 10)
       a           b
    1  a -0.56047565
    2  a -0.23017749
    3  a  1.55870831
    4  a  0.07050839
    5  a  0.12928774
    6  a  1.71506499
    7  a  0.46091621
    8  a -1.26506123
    9  a -0.68685285
    10 a -0.44566197
    
    df %>% group_by(a) %>% mutate(b = sample(b))
    # A tibble: 40 x 2
    # Groups:   a [4]
       a           b
       <chr>   <dbl>
     1 a      1.56  
     2 a      0.461 
     3 a      0.0705
     4 a      1.72  
     5 a     -0.560 
     6 a     -0.446 
     7 a     -1.27  
     8 a      0.129 
     9 a     -0.230 
    10 a     -0.687 
    # ... with 30 more rows