rdataframefunctionmatrixlapply

How do I implement a function that calls specific column names on a list of data frames in R? Error with lapply


I'm pretty new to R. I want to use a function that executes multiple calculations, these calculate across columns in a data frame and creates new columns that hold the final calculations. I want to implement this function across a list of data frames, however, when I try to use lapply I receive an error that states the first column name is missing with no default.

I know this must be an issue with how I am formatting my function, however I am struggling to come up with a solution for this. How can I proceed?

#create example data frames, my real data frames are named similarly, with an identical names and a unique id (i.e. example_df_uniqueidnumber), each data frame has columns named identically

df1 <- data.frame(pt1_X = c(1,2,3), pt2_X = c(1,2,3), pt1_Y = c(1,2,3), pt2_Y =c(1,2,3))
df2 <- data.frame(pt1_X = c(1,2,3), pt2_X = c(1,2,3), pt1_Y = c(1,2,3), pt2_Y =c(1,2,3))


#create my example function
#NOTE: I call the data "data" (instead of df1 or df2), because I am unsure of what to use instead, as each file name is different due to the unique identifier 

calculate_angles1 <- function(data, pt1_X, pt1_Y, pt2_X, pt2_Y) {
  data$Mx <- (data[[pt1_X]] - data[[pt2_X]])
  data$My <- (data[[pt1_Y]] - data[[pt2_Y]])
    return(data)
}

#create my list of data frames
new_list <- list(df1, df2)


#use lapply to attempt to apply my function to my list of data frames 
AoA <- lapply(new_list, calculate_angles1)

After I run my lapply function, I receive this error message..

Error in (function(x, i, exact) if (is.matrix(i)) as.matrix(x)[[i]] else .subset2(x,  : 
  argument "pt1_X" is missing, with no default

Solution

  • The issue with your function is that the name must be surrounded by double quotes. Also to treat the variable values use a single [ instead of double [[. So the function could be rewritten as:

     calculate_angles1 <- function(data) {
      data["Mx"] <- data["pt1_X"] - data["pt2_X"]
      data["My"] <- data["pt1_Y"] - data["pt2_Y"]
      data.frame(data)
    }
    

    To apply your function to list of dataframes there are various ways lapply as you mentioned:

    lapply

     new_list <- lapply(new_list,  calculate_angles1)
    

    or using map() function from purrr package or tidyverse family and I think this would be more straightforward. As your function is Data frame function it take a data frame as the first argument and returns data frame. So, you can call dplyr verbs to manipulate your data inside map as I did here, i.e calling mutate() from dplyr to create new variables.

    map

    library(tidyverse)
    new_list <- map(new_list, ~ mutate(.x, Mx=pt1_X-pt2_X, My=pt1_Y-pt2_Y))
    

    Both of these options produce same output:

    > new_list
    [[1]]
      pt1_X pt2_X pt1_Y pt2_Y Mx My
    1     1     1     1     1  0  0
    2     2     2     2     2  0  0
    3     3     3     3     3  0  0
    
    [[2]]
      pt1_X pt2_X pt1_Y pt2_Y Mx My
    1     1     1     1     1  0  0
    2     2     2     2     2  0  0
    3     3     3     3     3  0  0