Search code examples
rlistdplyrlapplymutate

Add a column to dataframes in a list based on the existence of other columns in R


I am attempting to add a new column to all the dataframes in a list that I have (long list containing ~200 dataframes), based on the existence of columns in these dataframes. Using modified and unmodified versions of the iris dataset as an example, I am trying to give each dataframe a new column called "species_fixed". the rules I am trying to follow are:

  1. If the column "Species" exists in the dataframe, add the information from that Species column for the new column "species_fixed".
  2. If the column "sp" exists in the dataframe, add the information from that sp column for the new column "species_fixed".
  3. If neither of these column names exists, make a species_fixed column that is all NAs.

Here was my attempt:

library(dplyr)

#Making a couple dataframes with various structures:

iris_1 <- iris %>% rename(sp = Species)
iris_2 <- iris %>% select(Sepal.Length, Sepal.Width)
iris_3 <- iris %>% mutate(species_2  = Species)

#Making them into a list:

iris_list <- list(iris, iris_1, iris_2, iris_3)

#Attempting to use lapply:

iris_list_fixed <- lapply(iris_list, function(q){
  species_fixed = mutate(ifelse('Species' %in% names(q), Species, ifelse(
'sp' %in% names(q), sp, "NA"))
})

I figure this must require some combination of lapply(), mutate(), ifelse() and potentially other functions, but I can't quite seem to land it.


Solution

  • Here is a working example using your own example:

    library(dplyr)
    
    #Making a couple dataframes with various structures:
    
    iris_1 <- iris %>% rename(sp = Species)
    iris_2 <- iris %>% select(Sepal.Length, Sepal.Width)
    iris_3 <- iris %>% mutate(species_2  = Species)
    
    #Making them into a list:
    
    iris_list <- list(iris, iris_1, iris_2, iris_3)
    
    #Attempting to use lapply:
    
    iris_list_fixed  <- lapply(iris_list, function(df){
      df <- df %>%
        rowwise() %>%
        mutate(species_fixed  = ifelse('Species' %in% names(df), as.character(Species), 
                                        ifelse('sp' %in% names(df), as.character(sp), "NA")
                                       )
               )
    })