I am currently running a Poisson regression with the college applications as the dependent variable, and gender and race as the two independent variables.
I included an full factorial interaction term between gender and race to 1) observe their main effects and 2) observe their interaction effects. Whenever I include their interaction term in the Poisson regression, Stata outputs incidence rate ratios (IRRs) that appear normal, but the corresponding upper and lower limits of the confidence intervals are identical to each other.
I expected that the limits of my 95% confidence intervals (CIs) would not be equal to one another or their corresponding IRR. When I remove the interaction between the independent variables, the CIs appear as one would generally expect.
My code is as follows:
`glm apps i.race##i.gender, family(poisson) link(log) exposure(realpyears) vce(robust) eform`
The output is as follows:
Iteration 0: log pseudolikelihood = -183.25337
Iteration 1: log pseudolikelihood = -54.374792
Iteration 2: log pseudolikelihood = -39.020998
Iteration 3: log pseudolikelihood = -37.736794
Iteration 4: log pseudolikelihood = -37.702256
Iteration 5: log pseudolikelihood = -37.702146
Iteration 6: log pseudolikelihood = -37.702146
Generalized linear models Number of obs = 12
Optimization : ML Residual df = 12
Scale parameter = 1
Deviance = 5.04042e-14 (1/df) Deviance = 4.20e-15
Pearson = 5.19076e-20 (1/df) Pearson = 4.33e-21
Variance function: V(u) = u [Poisson]
Link function : g(u) = ln(u) [Log]
AIC = 6.283691
Log pseudolikelihood = -37.7021456 BIC = -29.81888
---------------------------------------------------------------------------------
| Robust
apps | IRR std. err. z P>|z| [95% conf. interval]
----------------+----------------------------------------------------------------
race |
AI/AN | 7.571374 3.26e-13 4.7e+13 0.000 7.571374 7.571374
Asian | .9290156 5.49e-16 -1.2e+14 0.000 .9290156 .9290156
Black | 1.345809 6.37e-16 6.3e+14 0.000 1.345809 1.345809
Hispanic | 1.94419 6.14e-16 2.1e+15 0.000 1.94419 1.94419
NHoPI | 1.991199 3.35e-10 4.1e+09 0.000 1.991199 1.991199
|
gender |
Woman | .688629 1.66e-16 -1.5e+15 0.000 .688629 .688629
|
race#gender |
AI/AN#Woman | 1.015982 1.73e-13 9.3e+10 0.000 1.015982 1.015982
Asian#Woman | .9747926 6.28e-16 -4.0e+13 0.000 .9747926 .9747926
Black#Woman | .710833 4.05e-16 -6.0e+14 0.000 .710833 .710833
Hispanic#Woman | .6341218 2.43e-16 -1.2e+15 0.000 .6341218 .6341218
NHoPI#Woman | 2.381544 4.01e-10 5.2e+09 0.000 2.381544 2.381544
|
_cons | .002722 4.98e-19 -3.2e+16 0.000 .002722 .002722
ln(realpyears) | 1 (exposure)
The problem is statistical, and nothing to do with your use of Stata or your Stata code. You are throwing a complicated model with several free parameters at a tiny dataset and the fit is almost inevitably excellent. What you see are side-effects of a model that essentially interpolates the data.
Look at your standard errors: they are all of the order of 1 in 10 billion or even much less; hence your confidence intervals are just very, very short and the confidence limits are slightly different (but not identical).
The phenomenon is an extension of the fact that two data points in the plane define a straight line uniquely. You need usually many more data points than parameters being estimated. So you need a much larger dataset to assess this model seriously, and the Catch-22 is that omitting the interactions would likely just leave you with a simplistic model.