Search code examples
rformattingmultiple-columnsnaming

How to change the columnames for each of a # datasets in your WorkSpace iteratively (preferably using a lapply)


I have a working prototype for what I want to do that works on the single case already. I load a single csv file-formatted dataset into RStudio via:

df<- read.csv("0-11-3-462.csv", header = FALSE)

This dataset is 503 by 31, and I want to rename all of the columns so that the 1st column is called Y, and the 2nd through 31st columns are called X1 through X30 respectively. I did so using this simple code:

# change column names of all the columns in the dataframe 'df'
colnames(df) <- c("Y", "X1","X2", "X3", "X4","X5", "X6", "X7","X8", "X9",
                  "X10","X11", "X12", "X13","X14", "X15", "X16","X17", 
                  "X18", "X19","X20", "X21", "X22","X23", "X24", "X25",
                  "X26", "X27", "X28","X29", "X30")

That is the working prototype for the single dataset case. In the my real project script, I do not have one dataset stored in an object called df, I have N datasets stored in an object called 'datasets', and I want to assign that same list of column names to each of the data sets stored in datasets (a List):

datasets <- lapply(filepaths_list, read.csv)

I have tried the following:

lapply(datasets, function(i) {
  colnames <- c("Y", "X1","X2", "X3", "X4","X5", "X6", "X7","X8", "X9",
                  "X10","X11", "X12", "X13","X14", "X15", "X16","X17", 
                  "X18", "X19","X20", "X21", "X22","X23", "X24", "X25",
                  "X26", "X27", "X28","X29", "X30") })

Which did run, but did not do what I wanted it to.

lapply(datasets, function(i) {
  colnames(datasets[[i]]) <- c("Y", "X1","X2", "X3", "X4","X5", "X6", "X7","X8", "X9",
                  "X10","X11", "X12", "X13","X14", "X15", "X16","X17", 
                  "X18", "X19","X20", "X21", "X22","X23", "X24", "X25",
                  "X26", "X27", "X28","X29", "X30") })

Which returns:

> lapply(datasets, function(i) {
+   colnames(datasets[[i]]) <- VarNames })
Error in `*tmp*`[[i]] : invalid subscript type 'list'
Called from: FUN(X[[i]], ...)

Then, I tried:

VarNames <- c("Y", "X1","X2", "X3", "X4","X5", "X6", "X7","X8", "X9",
              "X10","X11", "X12", "X13","X14", "X15", "X16","X17", 
              "X18", "X19","X20", "X21", "X22","X23", "X24", "X25",
              "X26", "X27", "X28","X29", "X30")

> lapply(datasets, function(i) {
+   colnames(datasets[[i]][1, ]) <- VarNames })
Error in `*tmp*`[[i]] : invalid subscript type 'list'
Called from: FUN(X[[i]], ...)

And finally, I tried:

lapply(datasets, function(i) { colnames(datasets[[1]][1, ]) <- c("Y", "X1","X2", "X3", 
                                                       "X4","X5", "X6", "X7",
                                                       "X8", "X9", "X10","X11", 
                                                       "X12", "X13","X14", 
                                                       "X15", "X16","X17", 
                                                       "X18", "X19","X20", 
                                                       "X21", "X22","X23", 
                                                       "X24", "X25", "X26", 
                                                       "X27", "X28","X29", 
                                                       "X30") })

Which did run, but didn't seem to change anything. I thought this was the most promising attempt by far because it wasn't until I ran datasets[[1]][1, ] in the Console that I got:

> colnames(datasets[[1]][1, ])
 [1] "V1"  "V2"  "V3"  "V4"  "V5"  "V6"  "V7"  "V8"  "V9"  "V10" "V11"
[12] "V12" "V13" "V14" "V15" "V16" "V17" "V18" "V19" "V20" "V21" "V22"
[23] "V23" "V24" "V25" "V26" "V27" "V28" "V29" "V30" "V31"

Which is what I was trying to replace.

. . . The following was a recommended solution in a comment beneath this post:

# change column names of all the columns in 'datasets'
datasets <- lapply(datasets, function(i) { 
   colnames(i) <- c("Y", "X1","X2", "X3", "X4","X5", "X6", "X7", "X8", "X9", "X10","X11", "X12", "X13","X14", "X15", "X16","X17", "X18", "X19","X20", "X21", "X22","X23", "X24", "X25", "X26", "X27", "X28","X29", "X30") })

When printed, this now generates the output:

> head(datasets, n = 3)
[[1]]
 [1] "Y"   "X1"  "X2"  "X3"  "X4"  "X5"  "X6"  "X7"  "X8"  "X9"  "X10"
[12] "X11" "X12" "X13" "X14" "X15" "X16" "X17" "X18" "X19" "X20" "X21"
[23] "X22" "X23" "X24" "X25" "X26" "X27" "X28" "X29" "X30"

[[2]]
 [1] "Y"   "X1"  "X2"  "X3"  "X4"  "X5"  "X6"  "X7"  "X8"  "X9"  "X10"
[12] "X11" "X12" "X13" "X14" "X15" "X16" "X17" "X18" "X19" "X20" "X21"
[23] "X22" "X23" "X24" "X25" "X26" "X27" "X28" "X29" "X30"

[[3]]
 [1] "Y"   "X1"  "X2"  "X3"  "X4"  "X5"  "X6"  "X7"  "X8"  "X9"  "X10"
[12] "X11" "X12" "X13" "X14" "X15" "X16" "X17" "X18" "X19" "X20" "X21"
[23] "X22" "X23" "X24" "X25" "X26" "X27" "X28" "X29" "X30"

Rather than three 503 by 31 elements as it did before, which is correct.


Solution

  • lapply iterates over the list of data.frames datasets, therefore i is not an index but the actual data.frame. This means you can operate directly on the argument passed to the function. To make it clearer, I've renamed i to one_dataset:

    datasets_new_colnames <- lapply(datasets, function(one_dataset) {
      colnames(one_dataset) <- c("Y", "X1","X2", "X3", "X4","X5", "X6", "X7","X8", "X9",
                                 "X10","X11", "X12", "X13","X14", "X15", "X16","X17", 
                                 "X18", "X19","X20", "X21", "X22","X23", "X24", "X25",
                                 "X26", "X27", "X28","X29", "X30")
      one_dataset
    })