Search code examples
rlistdataframeplotmodeling

How to create new object from columns of dataframe in list in R and use this for modeling in R?


I have a list of 10 dataframes, called "datalist", on which I want to apply several functions. I am quite new to R. I searched on the internet but can't find the right solution.

The dataframes all have the same variables, like simplified example below:

ID FID WETLAND TPI200 TPI350 TPI500 ...
1  1   no wetl 52     35     20     ...
2  2   wetl    21     19     19     ...
... 

The goal is to execute a PLS-DA model with this data for each dataframe. Therefore I want to first create for each dataframe an X-axis consisting of the values of the variables from the 4th till 8th column and a Y-axis consisting of the 3th column ("WETLAND"). But how do I do this for every dataframe? Is it with for loops I can create an X1, X2, ... and Y1, Y2, ... axis for the 10 dataframes? Or should I use lapply?

Second, I want to create a PLS-DA model for every dataframes using the created X and Y-axis of every dataframe. I can do this with the following code for one dataframe, but how do I apply this for every dataframe of the list?

library(mixOmics)

model.splsda<-splsda(X,Y,keepX = c(5,5)) 
model.splsda$loadings

Solution

  • You can write a custom model fitting function to subset the data.frames selecting just the columns of interest and run the modeling function.
    lapply the custom function to the data list and select the loadings from this output list with a *apply loop.

    library(mixOmics)
    
    custom_splsda <- function(data, ncomp, keepX, ..., Xcols, Ycol){
      Y <- data[[Ycol]]
      X <- data[Xcols]
      res <- splsda(X, Y, ncomp = ncomp, keepX = keepX, ...)
      res
    }
    
    model_list <- lapply(datalist, custom_splsda, ncomp = 2, keepX = c(5, 5), Xcols = 4:8, Ycol = "WETLAND")
    loadings_list <- lapply(model_list, '[[', 'loadings')
    
    loadings_list[[1]]
    #$X
    #         comp 1     comp 2
    #1401 -0.7929405 -0.2459434
    #1141 -0.3835902 -0.2429583
    #563   0.2417486  0.1065414
    #1257  0.3608328 -0.8967159
    #509  -0.1883114 -0.2550150
    #
    #$Y
    #       comp 1     comp 2
    #AF  0.7071068 -0.7071068
    #BE -0.7071068  0.7071068
    

    Data

    The data is from the first example in help('splsda'), repeated in order to create a list with several data sets.

    ## First example
    data(breast.tumors)
    X <- breast.tumors$gene.exp
    # Y will be transformed as a factor in the function,
    # but we set it as a factor to set up the colors.
    Y <- as.factor(breast.tumors$sample$treatment)
    
    names(X) <- breast.tumors$genes$name
    df1 <- data.frame(WETLAND = Y)
    df1 <- cbind(df1, X)
    datalist <- list(df1, df1, df1)
    names(datalist) <- sprintf("data_%d", 1:3)