I have a working prototype for what I want to do that works on the single case already. I load a single csv file-formatted dataset into RStudio via:
df<- read.csv("0-11-3-462.csv", header = FALSE)
This dataset is 503 by 31, and I want to rename all of the columns so that the 1st column is called Y, and the 2nd through 31st columns are called X1 through X30 respectively. I did so using this simple code:
# change column names of all the columns in the dataframe 'df'
colnames(df) <- c("Y", "X1","X2", "X3", "X4","X5", "X6", "X7","X8", "X9",
"X10","X11", "X12", "X13","X14", "X15", "X16","X17",
"X18", "X19","X20", "X21", "X22","X23", "X24", "X25",
"X26", "X27", "X28","X29", "X30")
That is the working prototype for the single dataset case. In the my real project script, I do not have one dataset stored in an object called df, I have N datasets stored in an object called 'datasets', and I want to assign that same list of column names to each of the data sets stored in datasets (a List):
datasets <- lapply(filepaths_list, read.csv)
I have tried the following:
lapply(datasets, function(i) {
colnames <- c("Y", "X1","X2", "X3", "X4","X5", "X6", "X7","X8", "X9",
"X10","X11", "X12", "X13","X14", "X15", "X16","X17",
"X18", "X19","X20", "X21", "X22","X23", "X24", "X25",
"X26", "X27", "X28","X29", "X30") })
Which did run, but did not do what I wanted it to.
lapply(datasets, function(i) {
colnames(datasets[[i]]) <- c("Y", "X1","X2", "X3", "X4","X5", "X6", "X7","X8", "X9",
"X10","X11", "X12", "X13","X14", "X15", "X16","X17",
"X18", "X19","X20", "X21", "X22","X23", "X24", "X25",
"X26", "X27", "X28","X29", "X30") })
Which returns:
> lapply(datasets, function(i) {
+ colnames(datasets[[i]]) <- VarNames })
Error in `*tmp*`[[i]] : invalid subscript type 'list'
Called from: FUN(X[[i]], ...)
Then, I tried:
VarNames <- c("Y", "X1","X2", "X3", "X4","X5", "X6", "X7","X8", "X9",
"X10","X11", "X12", "X13","X14", "X15", "X16","X17",
"X18", "X19","X20", "X21", "X22","X23", "X24", "X25",
"X26", "X27", "X28","X29", "X30")
> lapply(datasets, function(i) {
+ colnames(datasets[[i]][1, ]) <- VarNames })
Error in `*tmp*`[[i]] : invalid subscript type 'list'
Called from: FUN(X[[i]], ...)
And finally, I tried:
lapply(datasets, function(i) { colnames(datasets[[1]][1, ]) <- c("Y", "X1","X2", "X3",
"X4","X5", "X6", "X7",
"X8", "X9", "X10","X11",
"X12", "X13","X14",
"X15", "X16","X17",
"X18", "X19","X20",
"X21", "X22","X23",
"X24", "X25", "X26",
"X27", "X28","X29",
"X30") })
Which did run, but didn't seem to change anything. I thought this was the most promising attempt by far because it wasn't until I ran datasets[[1]][1, ] in the Console that I got:
> colnames(datasets[[1]][1, ])
[1] "V1" "V2" "V3" "V4" "V5" "V6" "V7" "V8" "V9" "V10" "V11"
[12] "V12" "V13" "V14" "V15" "V16" "V17" "V18" "V19" "V20" "V21" "V22"
[23] "V23" "V24" "V25" "V26" "V27" "V28" "V29" "V30" "V31"
Which is what I was trying to replace.
. . . The following was a recommended solution in a comment beneath this post:
# change column names of all the columns in 'datasets'
datasets <- lapply(datasets, function(i) {
colnames(i) <- c("Y", "X1","X2", "X3", "X4","X5", "X6", "X7", "X8", "X9", "X10","X11", "X12", "X13","X14", "X15", "X16","X17", "X18", "X19","X20", "X21", "X22","X23", "X24", "X25", "X26", "X27", "X28","X29", "X30") })
When printed, this now generates the output:
> head(datasets, n = 3)
[[1]]
[1] "Y" "X1" "X2" "X3" "X4" "X5" "X6" "X7" "X8" "X9" "X10"
[12] "X11" "X12" "X13" "X14" "X15" "X16" "X17" "X18" "X19" "X20" "X21"
[23] "X22" "X23" "X24" "X25" "X26" "X27" "X28" "X29" "X30"
[[2]]
[1] "Y" "X1" "X2" "X3" "X4" "X5" "X6" "X7" "X8" "X9" "X10"
[12] "X11" "X12" "X13" "X14" "X15" "X16" "X17" "X18" "X19" "X20" "X21"
[23] "X22" "X23" "X24" "X25" "X26" "X27" "X28" "X29" "X30"
[[3]]
[1] "Y" "X1" "X2" "X3" "X4" "X5" "X6" "X7" "X8" "X9" "X10"
[12] "X11" "X12" "X13" "X14" "X15" "X16" "X17" "X18" "X19" "X20" "X21"
[23] "X22" "X23" "X24" "X25" "X26" "X27" "X28" "X29" "X30"
Rather than three 503 by 31 elements as it did before, which is correct.
lapply
iterates over the list of data.frames datasets
, therefore i
is not an index but the actual data.frame. This means you can operate directly on the argument passed to the function. To make it clearer, I've renamed i
to one_dataset
:
datasets_new_colnames <- lapply(datasets, function(one_dataset) {
colnames(one_dataset) <- c("Y", "X1","X2", "X3", "X4","X5", "X6", "X7","X8", "X9",
"X10","X11", "X12", "X13","X14", "X15", "X16","X17",
"X18", "X19","X20", "X21", "X22","X23", "X24", "X25",
"X26", "X27", "X28","X29", "X30")
one_dataset
})