Search code examples
rspline

Spline model with calender year


I want to find out whether mortality rate (variable "mortality_rate") changed over the years (variable "Year"). Since the relationship between Year and mortality_rate is not linear (see figure), I want to run a spline model, with Year as independent and mortality_rate as dependent variable. How can run a spline model with 20 knots at year?

enter image description here

I have the following data in R:

dat <- structure(list(Year = c(1998, 1999, 2000, 2001, 2002, 2003, 2004, 
2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 
2016, 2017, 2018), mortality_rate = c(0.0088, 0.0077, 0.0082, 
0.0075, 0.0076, 0.0075, 0.0066, 0.0061, 0.0059, 0.0054, 0.0054, 
0.0058, 0.0056, 0.006, 0.0053, 0.0061, 0.0052, 0.0055, 0.0069, 
0.0074, 0.0073)), row.names = c(NA, 21L), class = "data.frame")

Solution

  • A second degree polynomial visually fits the data well (see plot at end) and all coefficients are highly significant:

    fm <- lm(mortality_rate ~ poly(Year, 2), dat)
    plot(dat)
    lines(fitted(fm) ~ Year, dat, col = "red")
    summary(fm)
    

    giving:

    Call:
    lm(formula = mortality_rate ~ poly(Year, 2), data = dat)
    
    Residuals:
           Min         1Q     Median         3Q        Max 
    -8.066e-04 -2.774e-04  1.149e-05  2.689e-04  7.702e-04 
    
    Coefficients:
                     Estimate Std. Error t value Pr(>|t|)    
    (Intercept)     0.0065619  0.0001013  64.793  < 2e-16 ***
    poly(Year, 2)1 -0.0024938  0.0004641  -5.373 4.17e-05 ***
    poly(Year, 2)2  0.0036130  0.0004641   7.785 3.61e-07 ***
    ---
    Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
    
    Residual standard error: 0.0004641 on 18 degrees of freedom
    Multiple R-squared:  0.8325,    Adjusted R-squared:  0.8139 
    F-statistic: 44.74 on 2 and 18 DF,  p-value: 1.037e-07
    

    screenshot