Search code examples
rlinear-regressionpiecewise

Segmented package breakpoints are variable and finding standard errors on breakpoints


Two questions today: one is about using the segmented package to create piece wise regression and getting different breakpoints when I run the model several times, and the second is about finding the standard error about the breakpoint.

Load and view data:

gbay<-data.frame(matrix(,nrow=46,ncol=3))
colnames(gbay)<-c("pop","cal.length","temp")

gbay$cal.length<-c(0.597, 0.834, 1.120, 1.353, 0.119, 1.301, 0.944, 3.127, 3.375, 3.171, 3.400, 3.376, 3.322, 3.785, 3.304, 3.197, 3.216,
 4.183, 2.171, 3.989, 3.187 ,4.153, 3.252, 4.960, 4.268, 4.827, 4.869, 3.932, 4.573, 4.645, 4.634, 4.713, 4.879, 4.724,
5.031, 4.746, 5.047, 5.714, 5.227, 4.701,4.280, 5.296, 4.977, 4.932, 4.374, 4.758)

gbay$temp<-c(16, 16, 16, 16, 16, 16, 16, 20, 20, 20, 20, 20, 20, 20, 20, 24, 24, 24, 24, 24, 24, 24, 24, 26, 26, 26, 26, 26, 26, 26, 28, 28, 28, 28,
28, 28, 28, 28, 28, 30, 30, 30, 30, 30, 30, 30)
gbay$pop<-gb

ggplot(gbay,aes(x=temp,y=cal.length))+geom_point()

1) Above I am attaching snail growth rate data over laboratory temperature. I want to create a piecewise regression that will identify the optimal growth temperature and maximal growth rate with a breakpoint. I have been using the segmented package to do so with some success, however, I am having some difficulty with my data in that the package goes back and forth between two breakpoints, one at 27.56 (which, based on the raw data and my question, the breakpoint x component I want) and another at 18.59, which occurs at a "false" optima that I don't want the model to calculate. I've tried manipulating psi in the segmented model, but it's 50-50 which breakpoint I get each time I run the model. Is there a way to tell the package to only focus on certain x bounds to search for the breakpoint?

m.gbay<-glm(cal.length~temp,gbay,family=gaussian)
seg.gbay<-segmented(m.gbay,seg.Z = ~temp, psi=28)
xmin<-min(gbay$temp,na.rm=T)
xmax<-max(gbay$temp,na.rm=T)
predicted.gbay<-data.frame(temp=seq(xmin,xmax,length.out=100))
predicted.gbay$cal.length<-predict(seg.gbay,predicted.gbay)
predicted.gbay$pop<-"gb"

ggplot(predicted.gbay,aes(x=temp,y=cal.length))+geom_line(aes(x=temp,y=cal.length))+
  ylab("Shell Length (mm)")+xlab("Common Garden Temperature (°C)")

summary(seg.gbay)

2) I am trying to extract both the x and y components of the breakpoint (psi) from this data. I have done so successfully. However, I was also hoping to be able to extract the error of the x and y components for the breakpoint. I think the model spits out the standard error for the x component, but I was wondering if there is a way in segmented or another package to find the error in the y component of the breakpoint?

breakpts<-data.frame(matrix(,nrow=1,ncol=4))
colnames(breakpts)<-c("brkptx","brkpty","x_err","y_err")

breakpts[1,1]<-seg.gbay$psi[[2]]
breakpts[1,2]<-(seg.gbay$psi[[2]]*coef(seg.gbay)[[2]])+(coef(seg.gbay)[[1]])
breakpts[1,3]<-seg.gbay$psi[[3]]

Solution

  • This dataset is quite small and hence it's ambiguous where the change point lies if you have no prior knowledge to guide it. Most existing packages just identify a single change point without quantifying their often weird distributions.

    I think the mcp package does what you want. Briefly, you model a line followed by a joined slope:

    model = list(
      cal.length ~ 1 + temp,  # Line with intercept
      ~ 0 + temp  # Joined slope
    )
    

    Now let's fit it. family = gaussian() is implicit. Note that I set a lot of iterations and parallel processing because the location of the change point is so pretty unidentifiable in this case, so a lot of work is needed to explore the posterior:

    library(mcp)
    fit = mcp(model, data = gbay, iter = 100000, cores = 3)
    

    The default plot shows the posterior for the estimated change point (blue lines). We can add fitted intervals (red) and prediction intervals (green). The gray lines are 25 random samples from the posterior:

    plot(fit, q_fit = T, q_predict = T)
    

    enter image description here

    As you can see, the posterior for the change point is bimodal. You can also call plot_pars(fit) and summary(fit) to see more details about individual parameters. If you want to test evidence for the earliest mode versus the second mode, you can use hypothesis(fit, "cp_1 < 25"). If you have a priori knowledge about credible slopes, change point locations, etc., this can easily be added using, e.g., mcp(model, gbay, prior = list(cp_1 = "dunif(0, 25)")).

    Read more about mcp, including install instructions, at the mcp website and in this preprint. Disclosure: I am the author of mcp.