Search code examples
rlistggplot2tibble

How to run operations on tibbles and the columns/values in the tibbles, with the tibbles being in a list?


I am new to R and therefore sorry, if the awnser is obvious. I am trying to perform operations on tibbles and their values/columns while this tibbles are part of a list. Previously I would upload each of the now tibbles manually as a data.frame (csv data) and perform the operations manually on the data.frame. Unfortunately this is tiresome, so I am trying to get all the operations I have in my script done for all my data.frames at the same time. For example, what worked so far for me was to add 0.7 to every element in every column by the name 'Temperature' in each tibble on the list. I did it like that:

for(i in seq_along(Data_List)) {Data_List[[i]]$Temperature <- Data_List[[i]]$Temperature + 0.7}

However I now would like to perform different tasks: primarily I need to divide my tibbles into sequences. When I worked with the one data.frame at a time, this is what I did:

df_Sitting <- df[1:12, ]
df_Standing <- df[13:26, ]
df_LigEx <- df[27:35, ]
df_VigEx <- df[36:42, ]
df_After <- df[43:54, ]

How do I adjust it properly for the list of all my tibbles/data.frames I now have? Secondly, I want to perform descriptive statistics, Pearson Correlation and Lin Correlation. Additionally I created a ggplot and a Bland-Altman-Plot. I did it like this:

describe(df$Temperature)
describe(df$Temp_core)
cor.test(df)
library(epiR)
epi.ccc(df$Temp_core, df$Temperature, ci = "z-transform", 
        conf.level = 0.95, rep.measure = FALSE, subjectid)
mdata <- melt(df, id="Time")
ggplot(data = mdata, aes(x = Time, y = value))+
  geom_point(aes(group= variable, color = variable))+
  geom_line(aes(group= variable, color = variable))
library(BlandAltmanLeh)
BlandAltman_df <- bland.altman.plot(df$Temp_core, df$Temperature, graph.sys = "ggplot2")
print(BlandAltman_df +theme(plot.title=element_text(hjust = 0.5)))

I want now to run all the functions above for the entire list of tibbles and variables within the tibbles at once and get all the corresponding Statistics and Plots, to later create a Markdown. I tried lapply but it somehow does not work. I hope I formulated the question correctly, I appreciate the help!!

PS, here is the ouput from dput(head(df, 20))

structure(list(Time = structure(c(52465, 52525, 52585, 52645, 
52705, 52765, 52825, 52885, 52945, 53005, 53065, 53125, 53185, 
53245, 53305, 53365, 53425, 53485, 53545, 53605), class = c("hms", 
"difftime"), units = "secs"), Temp_core = c(35.565, 36.097, 36.38, 
36.591, 36.782, 36.927, 37.067, 37.149, 37.208, 37.249, 37.276, 
37.296, 37.327, 37.349, 37.356, 37.376, 37.393, 37.397, 37.409, 
37.432), Temperature = c(33.87, 34.52, 34.85, 35.12, 35.37, 35.59, 
35.74, 35.82, 35.95, 3600, 36.06, 36.17, 36.23, 36.18, 36.16, 
36.18, 36.19, 36.19, 36.37, 36.37)), row.names = c(NA, -20L), class = c("tbl_df", 
"tbl", "data.frame"))

Solution

  • You can lapply the tests and plot code to the list members and return lists of tests results and plots. Something like the following.

    library(ggplot2)
    library(epiR)
    library(BlandAltmanLeh)
    
    Data_List <- lapply(Data_List, \(X){
      X[["Temperature"]] <- X[["Temperature"]] + 0.7
      X
    })
    
    cor_test_list <- lapply(Data_List, \(X) cor.test(formula = ~ Temperature + Temp_core, data = X))
    lin_test_list <- lapply(Data_List, \(X){
      epi.ccc(
        X[["Temp_core"]], 
        X[["Temperature"]], 
        ci = "z-transform", 
        conf.level = 0.95, 
        rep.measure = FALSE
      )
    })
    
    gg_plot_list <- lapply(Data_List, \(X){
      mdata <- reshape2::melt(X, id = "Time")
      ggplot(data = mdata, aes(x = Time, y = value))+
        geom_point(aes(group = variable, color = variable))+
        geom_line(aes(group= variable, color = variable))
    })
    
    BlandAltman_List <- lapply(Data_List, \(X){
      BlandAltman_df <- bland.altman.plot(X$Temp_core, X$Temperature, graph.sys = "ggplot2")
      BlandAltman_df + 
        theme(plot.title = element_text(hjust = 0.5))
    })
    

    The tests

    To access the test results, use once again *apply loops together with extraction functions.

    sapply(cor_test_list, "[[", "estimate")
    # df_a.cor  df_b.cor  df_c.cor 
    #0.7425467 0.5259107 0.4572278 
    
    sapply(cor_test_list, "[[", "statistic")
    #  df_a.t   df_b.t   df_c.t 
    #7.680738 4.283887 3.561892 
    
    sapply(cor_test_list, "[[", "p.value")
    #        df_a         df_b         df_c 
    #6.709843e-10 8.771860e-05 8.434625e-04 
    
    sapply(lin_test_list, "[[", "rho.c")
    sapply(lin_test_list, "[[", "sblalt")
    

    The plots

    The plots can be plotted one by one:

    gg_plot_list[[1]]
    BlandAltman_List[[1]]
    

    or in a loop with print.

    for(i in seq_along(gg_plot_list)) 
      print(gg_plot_list[[i]])
    

    Or to a graphics device (to disk file).

    for(i in seq_along(gg_plot_list)) {
      filename <- sprintf("Rplot%03d.png", i)
      png(filename = filename)
      print(gg_plot_list[[i]])
      dev.off()
    }
    

    Test data

    Data_List <- iris[1:2]
    names(Data_List) <- c("Temp_core", "Temperature")
    Data_List$Time <- rep(1:50, 3)
    Data_List <- split(Data_List, iris$Species)
    names(Data_List) <- paste("df", letters[1:3], sep = "_")
    Data_List <- lapply(Data_List, \(x){row.names(x) <- NULL; x})