Search code examples
spssmixed-modelsdummy-variable

SPSS version 23, MIXED module: maximum dummy variables?


I am using the MIXED routine, repeated measures. I have 10 dummy variables (0/1) and 8 scaled variables for fixed effects. The results keep showing that one of the dummy variables is redundant. I played around moving the order in which the dummy and scaled variables are listed. Usually the last listed dummy variable gets flagged as being redundant. Is there a maximum number of dummy variables that should be included in the model? Eight of the dummy variables refer to 8 geographical regions of a country.


Solution

  • To understand why SPSS 'kicks out' one of the dummy variables, you should look at the origin of these dummies.

    Let's say we have a dependent y belonging to a sample of objects. These objects come from 8 regions, x. In a flat regression model, we model the relation between y and x:

    y = a + bx + e.

    We want to know the value of b. But x is a nominal variable, so the categories or regions are not numbers, but names. Names don't fit in the above equation.

    You have probably recoded x into dummies x1, x2 to x8. Now look at the records in your data and their scores for x and the dummy variables. Here's an example of one record:

    x   x1  x2  x3  x4  x5  x6  x7  x8 
    8    0   0   0   0   0   0   0   1  
    

    If you look at the dummy variables one by one, and you get to x7, you know that the first 7 are al zeroes. For this record, you therefore already know that x8 must be 1. This is what SPSS means when it 'kicks out' redundant variable. This phenomenon is called perfect collinearity. The information in the last dummy you add to the model is redundant, because it is already in there.

    In conclusion: leave out one of the dummies. The dummy variable you leave out will serve as the reference category in your model. For each of the other dummies, you will calculate the coefficient that tells you how big the records or objects with a given value/category of x differ from the reference category that was left out.

    There are different ways to code your dummy variables in such a way that you use the mean as reference, in stead of one of the categories. Take a look at dummy coding on Wikipedia.

    I also like this article that explains how degrees of freedom work. Although I hadn't mentioned this term before, it does touch on the very same idea of how dummy coding works.