Search code examples
rmodelglmspline

Using the ns() function from the splines package inside glm()


I'm trying to use the ns() function from the splines package with a poisson GLM I am using to test for significance of particulate matter concentration (pm.lag0) on health outcomes (Freq):

   > gfit4 = glm(Freq ~ pm.lag0 + ns(date, df=2), family = poisson(), 
                 data = dt,  offset = log(pop))

I get these errors back:

Error in splineDesign(knots, x, ord, derivs, outer.ok = outer.ok) : 
  must have at least 'ord' knots
In addition: Warning message:
In sort(as.numeric(knots)) : NAs introduced by coercion

Is that not a valid use of ns()? Can someone help me decode this error message? The splines documentation that R provides doesn't seem to match this error (?ns).


Solution

  • I can't see a reason why, in principle, it is not possible to use ns() in glm(). To see why, study what ns() does in a formula. From ?ns

    > model.frame(weight ~ ns(height, df = 5), data = women)
       weight ns(height, df = 5).1 ns(height, df = 5).2 ns(height, df = 5).3 ns(height, df = 5).4 ns(height, df = 5).5
    1     115         0.000000e+00         0.000000e+00         0.000000e+00         0.000000e+00         0.000000e+00
    2     117         7.592323e-03         0.000000e+00        -8.670223e-02         2.601067e-01        -1.734045e-01
    3     120         6.073858e-02         0.000000e+00        -1.503044e-01         4.509132e-01        -3.006088e-01
    4     123         2.047498e-01         6.073858e-05        -1.677834e-01         5.033503e-01        -3.355669e-01
    5     126         4.334305e-01         1.311953e-02        -1.324404e-01         3.973211e-01        -2.648807e-01
    6     129         6.256681e-01         8.084305e-02        -7.399720e-02         2.219916e-01        -1.479944e-01
    7     132         6.477162e-01         2.468416e-01        -2.616007e-02         7.993794e-02        -5.329196e-02
    8     135         4.791667e-01         4.791667e-01         1.406302e-02         2.031093e-02        -1.354062e-02
    9     139         2.468416e-01         6.477162e-01         9.733619e-02         2.286023e-02        -1.524015e-02
    10    142         8.084305e-02         6.256681e-01         2.707683e-01         6.324188e-02        -4.052131e-02
    11    146         1.311953e-02         4.334305e-01         4.805984e-01         1.252603e-01        -5.240872e-02
    12    150         6.073858e-05         2.047498e-01         5.954160e-01         1.989926e-01         7.809246e-04
    13    154         0.000000e+00         6.073858e-02         5.009718e-01         2.755102e-01         1.627794e-01
    14    159         0.000000e+00         7.592323e-03         2.246113e-01         3.520408e-01         4.157556e-01
    15    164         0.000000e+00         0.000000e+00        -1.428571e-01         4.285714e-01         7.142857e-01
    

    Which shows that it provides the B-spline basis for the natural spline of the height variable. Nothing special here.

    Hence I suspect your date variable is not numeric or not something that R could work with by coercing it to be numeric without introducing NA - see the warning message. Without a reproducible example and information on your data it is impossible to tell however!

    In addition, you might want to look at the gam() function in package mgcv which as a Recommended package is distributed with R. It is designed to fit semi-parametric models in the manner you describe and can include parametric terms as well as smooths/splines of other terms. The package is fairly comprehensive and can fit a large number of types of spline. See it's manual.