Search code examples
rpermutationsampler-factor

Shuffle/permute values within specific columns, for a specific factor level of group-variable


I have a dataframe like the one below. I would like to mix up the the values from columns V1,V2 and V3 within factor levels A1,A2,B1,B2.

n<-1:10
df <- data.frame(factor = c("A1","A1","A1","A2","A2","A2",
                             "B1","B1","B1","B2","B2","B2"),                     
                vars<-as.data.frame(sapply(1:3,function(i)sample(n,12,replace=T))) )

   factor V1 V2 V3
1      A1  8  1  1
2      A1  7  2  9
3      A1  4  5  2
4      A2  6  5  2
5      A2  8  3  4
6      A2  1  9  3
7      B1  5  6  8
8      B1 10  4  6
9      B1  6  1  9
10     B2  4  6  7
11     B2  7  5  8
12     B2 10  2  7

I would like it to look like this:

   factor V1 V2 V3
1      A1  4  1  2
2      A1  8  5  1
3      A1  7  2  9
4      A2  8  9  2
5      A2  1  3  3
6      A2  6  5  4
7      B1  5  4  6
8      B1  6  6  8
9      B1  10 1  9
10     B2  10 6  8
11     B2  4  2  7
12     B2  7  5  7

I would ideally like to change the columns within the dataframe - not to add columns onto it. I have tried different options I found on this page such as:

require(plyr)
df1<- ddply(df, .(factor),summarize, ans=sample(V1))
or
df2<-transform(df, new.V1=ave(c(V1), factor, FUN=function(b) sample(b)))

Both work fine for just changing one column, but in both cases I cannot get it to sample several columns at once. df1 generates a new column without the rest of the old dataframe and df2 attaches the sampled column onto the old one. So in a way I prefer df1, but that doesn't help if I can't get it do several columns at once. There must be a simple solution to this, but I have scanned up and down stackoverflow and can't seem to find a solution. I'd really appreciate your help.


Solution

  • You already have the approach down--you just need to figure out how to apply it across multiple columns. For this, I would suggest lapply, like this...

    First, your sample data (but reproducible, with set.seed)

    set.seed(1)
    n <- 1:10
    df <- data.frame(factor = c("A1","A1","A1","A2","A2","A2",
                                "B1","B1","B1","B2","B2","B2"),
                     vars <- as.data.frame(
                       sapply(1:3, function(i) 
                         sample(n, 12, replace = T))))
    df
    #    factor V1 V2 V3
    # 1      A1  3  7  3
    # 2      A1  4  4  4
    # 3      A1  6  8  1
    # 4      A2 10  5  4
    # 5      A2  3  8  9
    # 6      A2  9 10  4
    # 7      B1 10  4  5
    # 8      B1  7  8  6
    # 9      B1  7 10  5
    # 10     B2  1  3  2
    # 11     B2  3  7  9
    # 12     B2  2  2  7
    

    We'll work on a copy instead of directly modifying your original data.

    df_copy <- df ## Because the next step is destructive
    
    df_copy[-1] <- lapply(df_copy[-1], function(x) {
      ave(x, df_copy[[1]], FUN = sample)
    })
    df_copy
    #    factor V1 V2 V3
    # 1      A1  6  8  1
    # 2      A1  3  4  3
    # 3      A1  4  7  4
    # 4      A2  3 10  4
    # 5      A2  9  5  9
    # 6      A2 10  8  4
    # 7      B1  7  4  6
    # 8      B1  7 10  5
    # 9      B1 10  8  5
    # 10     B2  2  7  7
    # 11     B2  1  2  2
    # 12     B2  3  3  9