Search code examples
rregressiondummy-datatrend

NA's produced when implementing dummy variables along with a time trend variable in R


I have a model which contains a time trend variable for 7 years (so 2000=1,2001=2,...,2006=7) as well as having dummy variables for 6 of the years (so one binary variable for each of the years excluding 2000). When I ask R to fit this linear model:

olsmodel=lm(lnyield ~ lnx1+ lnx2+ lnx3+ lnx4+ lnx5+ x6+ x7+ x8+ timetrend+ 
                     yeardummy2001+ yeardummy2002+ yeardummy2003+ yeardummy2004+ 
                     yeardummy2005+ yeardummy2006)

I get NA's produced for the last dummy variable in the model summary. Along with the following "Coefficients: (1 not defined because of singularities)".

I do not know why this is happening as all of the x_i variables are continuous and no subset of the dummies and the time trend are a linear combination of each other.

Any help as to why this might be happening would be much appreciated!


Solution

  • The problem is when you set the year trend to be 1:n, and also include dummy variable for each year, it happens to produce a non-full-column-rank covariates matrix:

    Say if there are only 3 categories: r1, r2, r3, the model is y ~ trend + c2 + c3 and the covariates matrix you will have is :

    > mat
         int trend c2 c3
    [1,]   1     1  0  0
    [2,]   1     1  0  0
    [3,]   1     2  1  0
    [4,]   1     2  1  0
    [5,]   1     3  0  1
    [6,]   1     3  0  1
    

    and you can find the column rank of covariates matrix mat is only 3 instead of the number of coefficients you need to estimate (4), i.e. t(mat)%*%mat is singular. That might cause the error.