Search code examples
rsplittime-serieslapplyxts

How to apply function over a list by taking list of list elements as function argument?


Reproduce-able data set-

set.seed(55)
data <- rnorm(8)

dates <- as.POSIXct("2019-03-18 10:30:00", tz = "CET") + 0:7*60
dataset <- xts(x = data, order.by = dates)

colnames(dataset) <- "R"
dataset$Timestep <- 1:8
dataset$Label <- 1
dataset$Label[4:8,] <- 2 

I am trying to fit linear regression model separately for each label by taking "R" as dependent variable and "timestamp" as predictor and return all the slopes ( in this case- 2).

Initially my thought was to use split and lapply function but could not manage to execute it as I don't know how to access list of list with lapply.

As the dataset is really large, I want to avoid for loop. Can you guys help? Really appreciate it.


Solution

  • 1) formula Use the formula shown to nest within Label:

    co <- coef(lm(R ~ factor(Label) / (Timestep + 1) + 0, dataset))
    co[grep("Timestep", names(co))]
    ## factor(Label)1:Timestep factor(Label)2:Timestep 
    ##              0.01572195              0.15327212 
    

    2) split/lapply Alternately use split/lapply as shown:

    slope <- function(x) coef(lm(R ~ Timestep, x))[2]
    sapply(split(dataset, dataset$Label), slope)
    ## 1.Timestep 2.Timestep 
    ## 0.01572195 0.15327212 
    

    2a) Alternately we can use the same last line of code but replace the slope function with a calculation that directly computes the slope without lm:

    slope <- function(x) with(x, cov(R, Timestep)  / var(Timestep))
    sapply(split(dataset, dataset$Label), slope)  # same as sapply line in (2)
    ##          1          2 
    ## 0.01572195 0.15327212 
    

    3) nlme This package comes with R so does not have to be installed.

    library(nlme)
    coef(lmList(R ~ Timestep | Label, dataset))[, "Timestep"]
    ## [1] 0.01572195 0.15327212