I have been searching in vain to find a solution to efficiently combine specific columns from separate data frames or from separate objects within the same list into new data frames. I have 5 data frames that all have the same row names (e.g. x, y, z) and the same column names (V1, V2, V3,...V1000).
V1 V2 V3 V4... V1 V2 V3 V4...
x 1 5 8 9 x 2 7 9 5
y 2 7 4 8 y 4 6 5 6
z 4 4 5 9 z 3 4 8 7
What I need to do is to extract V1 from all five frames and make a new data frame with those five V1 columns, and then do the same for the remaining V2-V1000 to get something like:
V1 V1 V1 V1 V1 V2 V2 V2 V2 V2
x 1 5 8 9 5 x 2 7 9 5 5
y 2 7 4 8 8 y 4 6 5 6 4
z 4 4 5 9 7 z 3 4 8 7 7
Eventually, I need to apply another function to all 1000 of the newly created data frames so if there is a way to loop this merging process and place the 1000 new frames into a new set of frames, that would be ideal.
I have tried various forms of merge, cbind, sapply and other solutions I have found suggested here and elsewhere on the web. The best I've been able to come up with is getting all five data frames into a single list and then using sapply to merge V1 from each object using sapply(y, "[[",2) where y is the list and 2 corresponds to the V1 column in each object. However, I can't seem to get this looped. I could use this code to manually create the 1000 new data frames but that would take forever.
Any suggestions or directions to other answers that might work would be appreciated!
We can loop through the column names and extract the columns from the 'data.frame's and cbind
it.
nm1 <- paste0("V", 1:1000)
lst <- lapply(nm1, function(x)
cbind(df1[x], df2[x], df3[x], df4[x], df5[x]))
The output is a list
of data.frames
. It is better to keep them as a list
instead of creating separate objects in the global environment. But if we need so,
list2env(setNames(lst, paste0("df_new", seq_along(lst))),
envir = .GlobalEnv)
set.seed(24)
df1 <- as.data.frame(matrix(sample(0:10, 3*1000,
replace=TRUE), ncol=1000, dimnames=list(c('x', 'y', 'z'),
paste0("V", 1:1000))) )
df2 <- as.data.frame(matrix(sample(0:10, 3*1000,
replace=TRUE), ncol=1000, dimnames=list(c('x', 'y', 'z'),
paste0("V", 1:1000))) )
df3 <- as.data.frame(matrix(sample(0:10, 3*1000,
replace=TRUE), ncol=1000, dimnames=list(c('x', 'y', 'z'),
paste0("V", 1:1000))) )
df4 <- as.data.frame(matrix(sample(0:10, 3*1000,
replace=TRUE), ncol=1000, dimnames=list(c('x', 'y', 'z'),
paste0("V", 1:1000))) )
df5 <- as.data.frame(matrix(sample(0:10, 3*1000,
replace=TRUE), ncol=1000, dimnames=list(c('x', 'y', 'z'),
paste0("V", 1:1000))) )