Search code examples
rfixestmulticollinearity

Exclude Interacted variable to avoid colinearity dropping variables in FE regression (R)


I am running a regression such as the following:

library(fixest)
reg <- feols(data = df, outcome ~ interaction * (typeA + control1 + control2 + control3) + interaction * (typeB + control1 + control2 + control3) + interaction * (typeC + control1 + control2 + control3) + interaction * (typeD + control1 + control2 + control3) + interaction * (typeE + control1 + control2 + control3)| area + year, vcov = "Cluster")

Where typeA-typeE are all mutually exclusive dummy variables. Essentially, they all form one large factor variable, split into their dummy components. All observations within the regression belong to one of these categories. The controls are continuous variables.

Running the regression consistently produces errors such as:

The variable 'interaction:typeB' has been removed because of collinearity (see $collin.var).

I tried recoding the dummy variables as one factor variable, but this merely changed the error to:

The variable 'interaction:cat_variable::typeB' has been removed because of collinearity (see $collin.var).

The typeX variables are not collinear with any of the controls, nor each other (they are mutually exclusive) but the interaction:typeX variable is definitionally collinear with the two variables it interacts.

I am therefore wondering, is it possible to exclude the interaction variable by itself to produce the regression and circumvent this collinearity? I need the estimates for all of the typeX interactions and their non-interacted estimates, with the exception of interaction:typeA, because typeA can never be interacted (interaction for all of TypeA's rows == 0). However, setting TypeA as the reference level did not prevent this error.

If it is impossible to exclude the interaction term by itself, or if that would not fix the loss of estimates, is there another way to overcome this multicollinearity? The whole purpose of the regression is the interaction term by type, so the multicollinearity with the interaction term seems inherent. I would be grateful for all suggestions.

I regret that I cannot share code or sample data, as it is both too large and sensitive to share at present.


Solution

  • This can be solved by simply rewriting the equation as the following:

    library(fixest)
    reg <- feols(data = df, outcome ~ interaction:(typeB + control1 + control2 + control3) + interaction:(typeC + control1 + control2 + control3) + interaction:(typeD + control1 + control2 + control3) + interaction:(typeE + control1 + control2 + control3) + control1 + control2 + control3 | area + year, vcov = "Cluster")
    

    Where typeA as the reference category is automatically dropped and all estimates are thus marginal increases over the baseline typeA (where TypesB-E are all 0).

    All estimates are now produced and there is no collinearity, but the interaction effects are all maintained.