Search code examples
rdataframeevallapplyr-colnames

Change column names in list of list of data frames using lapply


This is a follow-up to this question: Create scatter plot with interval data in R

I would like to change the column names in the following data.frames, that are part of a list of lists:

other_list #a list of arbitrary length containing some data
myvar <- "myactualMeasurement"

lapply_output <- list()
for(i in 1:length(other_list)){
  lapply_output[[i]] <- lapply(other_list[[i]], function(item){
      out_df <- data.frame('MyItem' = item$MyItem,
                           'Measurement' = item$Measurement,
                           'Interval' = seq(floor(item$First), floor(item$Last))+ 0.5)
      return(out_df)
  })
}

As you can see, I'm assigning the names 'MyItem', 'Measurement' and 'Interval' to my columns. I would like to assign the name 'Measurement' using the variable "myvar" instead of doing it manually. I've already tried to use

eval(parse(text = myvar))

instead of 'Measurement' in my lapply structure, but that does not seem to work.

My current workaround is a nested loop which (re-)assigns the column name:

for(i in 1:length(other_list)){
  for(j in 1:length(lapply_output[[i]])){
    colnames(lapply_output[[i]][[j]])[which(names(lapply_output[[i]][[j]]) == "Measurement")] <- myvarpar
  }
}

I'm sure, there has to be a more neat way of doing this (preferentially a oneliner in the lapply structure, but I can't come up with a good solution.

An alternative could be (see Using lapply to change column names of a list of data frames):

new_col_name <- c("MyItem", myvar, "Interval")
for(i in 1:length(other_list)){
  newlist[[i]] <- lapply(lapply_output[[i]], setNames, nm = new_col_name)
}

But this is 1) not really doing what it should do (only the last list element is preserved) 2) is also not neat

Preferentially, I would like to use something like

eval(parse(text = myvar))

in the original structure, without having to write much more additional naming code.


Solution

  • By default lapply loops through the input list elements hence you need not duplicate with for loop indexing. Also there is no need to create a dummy list prior to lapply since default output class of lapply is a list object

    You can rename the column name in one step as below with match being used to compare column names

    outputVar <- "myactualMeasurement"
    inputVar <- "Measurement"
    
    outList = lapply(other_list, function(item){
    
          out_df <- data.frame('MyItem' = item$MyItem,
                               'Measurement' = item$Measurement,
                               'Interval' = seq(floor(item$First), floor(item$Last))+ 0.5)
    
          inputvarIndex <- match(inputVar,colnames(out_df))
          colnames(out_df)[inputvarIndex] <- outputVar
    
          return(out_df)
      })
    

    I strongly suggest to thoroughly read the documentation and examples of ?lapply and note that eval/parse though seemingly convenient are vulnerable to unexpected results