I started running into the error (converted from warning):
glm.fit (or glm.fit2): fitted probabilities numerically 0 or 1 occurred
I found this link referencing linear separation of data:
[R] glm.fit: "fitted probabilities numerically 0 or 1 occurr
So I tried hunting through the data and found a small reproducible example from a small subset of the data (both glm and glm2) where I don't actually see the linear separation and yet I get the error:
response = c(0,1,0,1,0,0,0,0,0,0)
dependent = c(133,571,1401,4930,3134075,44357054,1718619387,1884020779,8970035092,9392823637)
foo = data.frame(y=response,x=dependent)
glm(y ~ x, family=binomial, data=foo)
I can avoid the issue by transforming the dependent via log(x+1)
, however, this is monotonic and doesn't alter the ordering so I'm not sure why that helps and whether I should be doing so. The dependents are "microseconds since the last time some event happened" which is why some values can be large. I tried turning it into a two level factor of (recent, not recent) but that loses information and underperforms the raw values.
I think this is just a feature of the data and the rounding of the floating point calculations going on in the optimization of the maximum likelihood function.
Take a look at the fitted values of the log transformed set:
> response = c(0,1,0,1,0,0,0,0,0,0)
> dependent = c(133,571,1401,4930,3134075,44357054,1718619387,1884020779,8970035092,9392823637)
>
> foo = data.frame(y=response,x=log(dependent))
> mlog <- glm(y ~ x, family=binomial, data=foo)
> mlog$fitted
1 2 3 4
0.584089292 0.484155299 0.422713978 0.340825478
5 6 7 8
0.079815887 0.040011202 0.014931996 0.014562755
9 10
0.009506656 0.009387457
Whereas the untransformed set results in the occurance miniscule fitted values:
> foo = data.frame(y=response,x=dependent)
> m <- glm(y ~ x, family=binomial, data=foo)
Warning message:
glm.fit: fitted probabilities numerically 0 or 1 occurred
> m$fitted.values
1 2 3
5.007959e-01 5.005387e-01 5.000511e-01
4 5 6
4.979784e-01 6.359085e-04 2.220446e-16
7 8 9
2.220446e-16 2.220446e-16 2.220446e-16
10
2.220446e-16
Doesn't seem to be a warning related to complete (or quasi) separation of the data. I think the warning is pretty informative in this case.