Search code examples
rdataframedplyrlmpiecewise

Extract breakpoints from multiple variables from grouped data using piecewise (segmented) regression


I would like to use piecewise regression and extract breakpoints across multiple variables from my grouped data.

I have used following code to do it one by one for each variable & group:

library(segmented)

mod_lm <- lm(y ~ x, data = df) #Do LM
mod_seg <- segmented(mod_lm, seg.Z = ~ x) #Do segmented regression
mod_seg$psi #Extract breakpoint & standard error of the estimate

I would like to run this across dependent variables, while independent variable remains the same. I also have grouping variable in the data, which I would also like to have included.

My data looks like this:

x   Group   Var     y
9   Group1  Var1    0.6901
6   Group1  Var1    0.6346
5   Group1  Var1    0.8089
5   Group1  Var1    0.1274
7   Group1  Var1    0.6426
1   Group1  Var2    0.1059
2   Group1  Var2    0.6989
4   Group1  Var2    0.1129
7   Group1  Var2    0.1458
7   Group1  Var2    0.8185
2   Group2  Var1    0.7950
0   Group2  Var1    0.0533
1   Group2  Var1    0.1866
3   Group2  Var1    0.3876
8   Group2  Var1    0.2788
2   Group2  Var2    0.1559
8   Group2  Var2    0.3382
1   Group2  Var2    0.6346
9   Group2  Var2    0.6038
8   Group2  Var2    0.2026

How can I get the breakpoints and store them in a new dataframe?


Solution

  • This heavily relies on this answer: How to use segmented package when working with data frames with dplyr package to perform piecewise linear regression?

    I am not sure if this is what you are after, but we can loop over the groups and extract psi at the end. You can also rename the columns to be initial, Est., and St.Err if you like. With this dataset, I cannot quite wrap my head around "cleaning up" since it only returns results for one of the groups.

    library(tidyverse)
    library(segmented)
    
    suppressWarnings(
    df %>% 
      nest_by(Group, Var) %>%
      mutate(mod_lm = list(lm(y ~ x, data = data)),
             mod_seg = list(tryCatch(segmented(mod_lm, seg.Z = ~x),
                            error = function(e) list(NA))),
             psi = list(mod_seg[['psi']])) %>% 
      unnest(cols = psi, keep_empty = TRUE)
    )
    #> breakpoint estimate(s): 5.895779
    #> # A tibble: 4 x 6
    #> # Groups:   Group, Var [4]
    #>   Group  Var                 data mod_lm mod_seg    psi[,1]  [,2]  [,3]
    #>   <chr>  <chr> <list<tibble[,2]>> <list> <list>       <dbl> <dbl> <dbl>
    #> 1 Group1 Var1             [5 x 2] <lm>   <list [1]>      NA NA    NA   
    #> 2 Group1 Var2             [5 x 2] <lm>   <lm>            NA NA    NA   
    #> 3 Group2 Var1             [5 x 2] <lm>   <segmentd>       2  2.00  1.17
    #> 4 Group2 Var2             [5 x 2] <lm>   <list [1]>      NA NA    NA
    

    Data:

    read.table(text = "x  Group Var y
    9 Group1  Var1  0.6901
    6 Group1  Var1  0.6346
    5 Group1  Var1  0.8089
    5 Group1  Var1  0.1274
    7 Group1  Var1  0.6426
    1 Group1  Var2  0.1059
    2 Group1  Var2  0.6989
    4 Group1  Var2  0.1129
    7 Group1  Var2  0.1458
    7 Group1  Var2  0.8185
    2 Group2  Var1  0.7950
    0 Group2  Var1  0.0533
    1 Group2  Var1  0.1866
    3 Group2  Var1  0.3876
    8 Group2  Var1  0.2788
    2 Group2  Var2  0.1559
    8 Group2  Var2  0.3382
    1 Group2  Var2  0.6346
    9 Group2  Var2  0.6038
    8 Group2  Var2  0.2026", header = T, stringsAsFactor = F) -> df