I have a dataframe like the one below. I would like to mix up the the values from columns V1,V2 and V3 within factor levels A1,A2,B1,B2.
n<-1:10
df <- data.frame(factor = c("A1","A1","A1","A2","A2","A2",
"B1","B1","B1","B2","B2","B2"),
vars<-as.data.frame(sapply(1:3,function(i)sample(n,12,replace=T))) )
factor V1 V2 V3
1 A1 8 1 1
2 A1 7 2 9
3 A1 4 5 2
4 A2 6 5 2
5 A2 8 3 4
6 A2 1 9 3
7 B1 5 6 8
8 B1 10 4 6
9 B1 6 1 9
10 B2 4 6 7
11 B2 7 5 8
12 B2 10 2 7
I would like it to look like this:
factor V1 V2 V3
1 A1 4 1 2
2 A1 8 5 1
3 A1 7 2 9
4 A2 8 9 2
5 A2 1 3 3
6 A2 6 5 4
7 B1 5 4 6
8 B1 6 6 8
9 B1 10 1 9
10 B2 10 6 8
11 B2 4 2 7
12 B2 7 5 7
I would ideally like to change the columns within the dataframe - not to add columns onto it. I have tried different options I found on this page such as:
require(plyr)
df1<- ddply(df, .(factor),summarize, ans=sample(V1))
or
df2<-transform(df, new.V1=ave(c(V1), factor, FUN=function(b) sample(b)))
Both work fine for just changing one column, but in both cases I cannot get it to sample several columns at once. df1 generates a new column without the rest of the old dataframe and df2 attaches the sampled column onto the old one. So in a way I prefer df1, but that doesn't help if I can't get it do several columns at once. There must be a simple solution to this, but I have scanned up and down stackoverflow and can't seem to find a solution. I'd really appreciate your help.
You already have the approach down--you just need to figure out how to apply it across multiple columns. For this, I would suggest lapply
, like this...
First, your sample data (but reproducible, with set.seed
)
set.seed(1)
n <- 1:10
df <- data.frame(factor = c("A1","A1","A1","A2","A2","A2",
"B1","B1","B1","B2","B2","B2"),
vars <- as.data.frame(
sapply(1:3, function(i)
sample(n, 12, replace = T))))
df
# factor V1 V2 V3
# 1 A1 3 7 3
# 2 A1 4 4 4
# 3 A1 6 8 1
# 4 A2 10 5 4
# 5 A2 3 8 9
# 6 A2 9 10 4
# 7 B1 10 4 5
# 8 B1 7 8 6
# 9 B1 7 10 5
# 10 B2 1 3 2
# 11 B2 3 7 9
# 12 B2 2 2 7
We'll work on a copy instead of directly modifying your original data.
df_copy <- df ## Because the next step is destructive
df_copy[-1] <- lapply(df_copy[-1], function(x) {
ave(x, df_copy[[1]], FUN = sample)
})
df_copy
# factor V1 V2 V3
# 1 A1 6 8 1
# 2 A1 3 4 3
# 3 A1 4 7 4
# 4 A2 3 10 4
# 5 A2 9 5 9
# 6 A2 10 8 4
# 7 B1 7 4 6
# 8 B1 7 10 5
# 9 B1 10 8 5
# 10 B2 2 7 7
# 11 B2 1 2 2
# 12 B2 3 3 9