I would like to use piecewise regression and extract breakpoints across multiple variables from my grouped data.
I have used following code to do it one by one for each variable & group:
library(segmented)
mod_lm <- lm(y ~ x, data = df) #Do LM
mod_seg <- segmented(mod_lm, seg.Z = ~ x) #Do segmented regression
mod_seg$psi #Extract breakpoint & standard error of the estimate
I would like to run this across dependent variables, while independent variable remains the same. I also have grouping variable in the data, which I would also like to have included.
My data looks like this:
x Group Var y
9 Group1 Var1 0.6901
6 Group1 Var1 0.6346
5 Group1 Var1 0.8089
5 Group1 Var1 0.1274
7 Group1 Var1 0.6426
1 Group1 Var2 0.1059
2 Group1 Var2 0.6989
4 Group1 Var2 0.1129
7 Group1 Var2 0.1458
7 Group1 Var2 0.8185
2 Group2 Var1 0.7950
0 Group2 Var1 0.0533
1 Group2 Var1 0.1866
3 Group2 Var1 0.3876
8 Group2 Var1 0.2788
2 Group2 Var2 0.1559
8 Group2 Var2 0.3382
1 Group2 Var2 0.6346
9 Group2 Var2 0.6038
8 Group2 Var2 0.2026
How can I get the breakpoints and store them in a new dataframe?
This heavily relies on this answer: How to use segmented package when working with data frames with dplyr package to perform piecewise linear regression?
I am not sure if this is what you are after, but we can loop over the groups and extract psi
at the end. You can also rename the columns to be initial
, Est.
, and St.Err
if you like. With this dataset, I cannot quite wrap my head around "cleaning up" since it only returns results for one of the groups.
library(tidyverse)
library(segmented)
suppressWarnings(
df %>%
nest_by(Group, Var) %>%
mutate(mod_lm = list(lm(y ~ x, data = data)),
mod_seg = list(tryCatch(segmented(mod_lm, seg.Z = ~x),
error = function(e) list(NA))),
psi = list(mod_seg[['psi']])) %>%
unnest(cols = psi, keep_empty = TRUE)
)
#> breakpoint estimate(s): 5.895779
#> # A tibble: 4 x 6
#> # Groups: Group, Var [4]
#> Group Var data mod_lm mod_seg psi[,1] [,2] [,3]
#> <chr> <chr> <list<tibble[,2]>> <list> <list> <dbl> <dbl> <dbl>
#> 1 Group1 Var1 [5 x 2] <lm> <list [1]> NA NA NA
#> 2 Group1 Var2 [5 x 2] <lm> <lm> NA NA NA
#> 3 Group2 Var1 [5 x 2] <lm> <segmentd> 2 2.00 1.17
#> 4 Group2 Var2 [5 x 2] <lm> <list [1]> NA NA NA
read.table(text = "x Group Var y
9 Group1 Var1 0.6901
6 Group1 Var1 0.6346
5 Group1 Var1 0.8089
5 Group1 Var1 0.1274
7 Group1 Var1 0.6426
1 Group1 Var2 0.1059
2 Group1 Var2 0.6989
4 Group1 Var2 0.1129
7 Group1 Var2 0.1458
7 Group1 Var2 0.8185
2 Group2 Var1 0.7950
0 Group2 Var1 0.0533
1 Group2 Var1 0.1866
3 Group2 Var1 0.3876
8 Group2 Var1 0.2788
2 Group2 Var2 0.1559
8 Group2 Var2 0.3382
1 Group2 Var2 0.6346
9 Group2 Var2 0.6038
8 Group2 Var2 0.2026", header = T, stringsAsFactor = F) -> df