I want to fit a linear-plateau (nls) model describing height as a function of age, and I want to test if there are signiffiant differences for any parameter of the model between regions.
Here is what I have so far:
# Create data
df1 <- cbind.data.frame (height = c (0.5, 0.6, 0.9, 1.3, 1.5, 1.6, 1.6,
0.6, 0.6, 0.8, 1.3, 1.5, 1.6, 1.5,
0.6, 0.8, 1.0, 1.4, 1.6, 1.6, 1.6,
0.5, 0.8, 1.0, 1.3, 1.6, 1.7, 1.6),
age = c (0.5, 0.9, 3.0, 7.3, 12.2, 15.5, 20.0,
0.4, 0.8, 2.3, 8.5, 11.5, 14.8, 21.3,
0.5, 1.0, 5.1, 11.1, 12.3, 16.0, 19.8,
0.5, 1.1, 5.5, 10.2, 12.2, 15.4, 20.5),
region = as.factor (c (rep ("A", 7),
rep ("B", 7),
rep ("C", 7),
rep ("D", 7))))
> head (df1)
height age region
1 0.5 0.5 A
2 0.6 0.9 A
3 0.9 3.0 A
4 1.3 7.3 A
5 1.5 12.2 A
6 1.6 15.5 A
# Create linear-plateau function
lp <- function(x, a, b, c){
ifelse (x < c, a + b * x, a + b * c)
} # Where 'a' is the intercept, 'b' the slope and 'c' the breakpoint
# Fit the model ignoring region
m1 <- nls (height ~ lp (x = age, a, b, c),
data = df1,
start = list (a = 0.5, b = 0.1, c = 13))
> summary (m1)
Formula: height ~ lp(x = age, a, b, c)
Parameters:
Estimate Std. Error t value Pr(>|t|)
a 0.582632 0.025355 22.98 <2e-16 ***
b 0.079957 0.003569 22.40 <2e-16 ***
c 12.723995 0.511067 24.90 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.07468 on 25 degrees of freedom
Number of iterations to convergence: 2
Achieved convergence tolerance: 5.255e-09
I want to fit the same model but considering region
, and test if a
, b
, and c
estimates are different between regions.
I belive this post might be usefull, but I dont know how to apply it to this data/function.
Here is how data looks like:
Solutions without using nls are welcome as well
Fit the model with the same parameters for each region giving fm1 and again with different parameters giving fm2 and use anova to test the difference.
We use the plinear
algorithm for fm1 as it eliminates the need for starting values for the linear parameters. In that case the RHS should be a matrix whose first column multiplies the intercept and whose second column multiplies the slope. The two linear parameters will be named .lin1
and .lin2
. We use the coefficients from fm1 repeated 4 times as starting values for the fm2 fit.
fm1 <- nls(height ~ cbind(1, pmin(age, c)), df1, start = list(c = mean(df1$age)),
algorithm = "plinear")
co <- as.list(coef(fm1))
fm2 <- nls(height ~ a[region] + b[region] * pmin(age, c[region]), df1,
start = list(a = rep(co$.lin1, 4), b = rep(co$.lin2, 4), c = rep(co$c, 4)))
anova(fm1, fm2)
giving:
Analysis of Variance Table
Model 1: height ~ cbind(1, pmin(age, c))
Model 2: height ~ a[region] + b[region] * pmin(age, c[region])
Res.Df Res.Sum Sq Df Sum Sq F value Pr(>F)
1 25 0.13944
2 16 0.11895 9 0.020483 0.3061 0.9617
thus we cannot reject the hypothesis that the parameters are the same across regions.
If we wished to test differing values of c but common intercepts and slopes we could use
fm3 <- nls(height ~ cbind(1, pmin(age, c[region])), df1,
start = list(c = rep(co$c, 4)), algorithm = "plinear")
anova(fm1, fm3)
Although we cannot reject the hypothesis that the values of c are the same across regions visually below we see that the cutoff ages for plateau values do look somewhat different so we might want to use fm3 even though it is not significantly different than fm1. We might want to be guided by other factors associated with the application here rather than just the fit.
Below we show the individual fits from fm2 and the overall fit from fm1.
library(ggplot2)
df1$Everything <- "Everything"
ggplot(df1, aes(age, fitted(fm2), col = region)) +
geom_line() +
geom_point() +
geom_line(aes(age, fitted(fm1), col = Everything), lty = 2, lwd = 2)