Reproduce-able data set-
set.seed(55)
data <- rnorm(8)
dates <- as.POSIXct("2019-03-18 10:30:00", tz = "CET") + 0:7*60
dataset <- xts(x = data, order.by = dates)
colnames(dataset) <- "R"
dataset$Timestep <- 1:8
dataset$Label <- 1
dataset$Label[4:8,] <- 2
I am trying to fit linear regression model separately for each label by taking "R" as dependent variable and "timestamp" as predictor and return all the slopes ( in this case- 2).
Initially my thought was to use split and lapply function but could not manage to execute it as I don't know how to access list of list with lapply.
As the dataset is really large, I want to avoid for loop. Can you guys help? Really appreciate it.
1) formula Use the formula shown to nest within Label
:
co <- coef(lm(R ~ factor(Label) / (Timestep + 1) + 0, dataset))
co[grep("Timestep", names(co))]
## factor(Label)1:Timestep factor(Label)2:Timestep
## 0.01572195 0.15327212
2) split/lapply Alternately use split/lapply as shown:
slope <- function(x) coef(lm(R ~ Timestep, x))[2]
sapply(split(dataset, dataset$Label), slope)
## 1.Timestep 2.Timestep
## 0.01572195 0.15327212
2a) Alternately we can use the same last line of code but replace the slope
function with a calculation that directly computes the slope without lm
:
slope <- function(x) with(x, cov(R, Timestep) / var(Timestep))
sapply(split(dataset, dataset$Label), slope) # same as sapply line in (2)
## 1 2
## 0.01572195 0.15327212
3) nlme This package comes with R so does not have to be installed.
library(nlme)
coef(lmList(R ~ Timestep | Label, dataset))[, "Timestep"]
## [1] 0.01572195 0.15327212