Removing completely separated observations from glm()

I'm doing a bit of exploratory data analysis using HMDA data from the AER package; however, the variables that I used to fit the model seem to contain some observations that perfectly determine the outcomes, an issue known as "separation." So I tried to remedy this using the solution recommended by this thread, yet when I tried to execute the first set of source code from glm.fit(), R returned an error message:

Error in family$family : object of type 'closure' is not subsettable

so I could not proceed any further to remove those fully determined observations from my data with this code. I am wondering if anyone could help me fix this?

My current code is provided at below for your reference.

# load the AER package and HMDA data
library(AER)
data(HMDA)

# fit a 2-degree olynomial probit model 
probit.fit <- glm(deny ~ poly(hirat, 2), family = binomial, data = HMDA)

# using the revised source code from that stackexchage thread to find out observations that received a warning message
library(tidyverse)
library(dplyr)
library(broom)

eps <- 10 * .Machine$double.eps
if (family$family == "binomial") {
  if (any(mu > 1 - eps) || any(mu < eps)) 
    warning("glm.fit: fitted probabilities numerically 0 or 1 occurred", 
            call. = FALSE)
}

# this return the following error message
# Error in family$family : object of type 'closure' is not subsettable


probit.resids <- augment(probit.fit) %>%
  mutate(p = 1 / (1 + exp(-.fitted)),
         warning = p > 1-eps)

arrange(probit.resids, desc(.fitted)) %>%  
  select(2:5, p, warning) %>% 
  slice(1:10)


HMDA.nwarning <- filter(HMDA, !probit.resids$warning)

# using HMDA.nwarning should solve the problem...
probit.fit <- glm(deny ~ poly(hirat, 2), family = binomial, data = HMDA.nwarning)

Solution

This chunk of code

if (family$family == "binomial") {
  if (any(mu > 1 - eps) || any(mu < eps)) 
    warning("glm.fit: fitted probabilities numerically 0 or 1 occurred", 
            call. = FALSE)
}

there is a function, binomial() called when you run glm with family == "binomial". If you look under glm (just type glm):

if (is.character(family)) 
        family <- get(family, mode = "function", envir = parent.frame())
    if (is.function(family)) 
        family <- family()
    if (is.null(family$family)) {
        print(family)
        stop("'family' not recognized")
    }

And the glm function checks binomial()$family during the fit, and if any of the predicted values differ from 1 or 0 by eps, it raises that warning.

You don't need to run that part, and yes, you need to set eps <- 10 * .Machine$double.eps . So let's run the code below, and if you run a probit, you need to specify link="probit" in binomial, otherwise the default is logit:

library(AER)
library(tidyverse)
library(dplyr)
library(broom)

data(HMDA)

probit.fit <- glm(deny ~ poly(hirat, 2), family = binomial(link="probit"), data = HMDA)

eps <- 10 * .Machine$double.eps

probit.resids <- augment(probit.fit) %>%
  mutate(p = 1 / (1 + exp(-.fitted)),
         warning = p > 1-eps)

The column warning indicates if the observations raises a warning, in this dataset, there's one:

table(probit.resids$warning)

FALSE  TRUE 
 2379     1

We can use the next step to filter it

HMDA.nwarning <- filter(HMDA, !probit.resids$warning)
dim(HMDA.nwarning)
[1] 2379   14

And rerun the regression:

probit.fit <- glm(deny ~ poly(hirat, 2), family = binomial(link="probit"), data = HMDA.nwarning)
coefficients(probit.fit)
(Intercept) poly(hirat, 2)1 poly(hirat, 2)2 
      -1.191292        8.708494        6.884404