Search code examples
rlogistic-regression

Error running GEE logistic Model: NA/NaN/Inf in foreign function call (arg 2)


I am running a logistic regression model implemented through generalized estimating equations (GEEs) and keep running into the following error despite trying multiple solutions posted here on SO and elsewhere. I am unsure from where this error arises. I am using the gee package but the error also occurs in geepack.

Does anyone know why this error may be occurring despite no NA, inf, or character variables in the dataset? My suspicion is that there is something very simple I am missing, but after two days, I have to throw it to better coders than me.

Minimal data and code to reproduce the error, attempts at solutions, and relevant SO questions are below.


Data

df <- structure(list(id = structure(c(7L, 1L, 20L, 15L, 14L, 6L, 8L,  24L, 21L, 19L, 5L, 4L, 18L, 
                                13L, 23L, 16L, 25L, 12L, 10L, 9L,  22L, 17L, 11L, 3L, 2L, 2L), 
                              levels = c("ALWA28M", "BOMA13M", "BOMA41M",  "DAYA35M", "DEMB72M", "EDAB3WM", "EFCH52M", 
                                         "FASI6M", "FRRO35M",  "GRAS35F", "GRKA48M", "JARA35M", "KABA27M", "KECH4WM", 
                                         "MAAD60M",  "MACH33M", "MEBA29F", "MIGU42M", "MTSA10M", "NTMA22F", "RACA2M",  
                                         "STMA35M", "TOKE39M", "TRMA12M", "YOLU29M"), class = "factor"),      
               testres = structure(c(1L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 
                                         1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 1L), 
                                       levels = c("POS", "NEG"), class = "factor"), 
               agegrp = structure(c(5L, 3L, 3L, 5L, 1L, 1L, 2L, 2L, 1L, 2L, 6L, 4L, 4L, 
                                    3L, 4L, 4L, 3L, 4L, 4L, 4L, 4L, 3L, 5L, 4L, 2L, 2L), 
                                  levels = c("0", "1", "2", "3", "4", "5"), class = "factor")), 
          row.names = c(NA,  26L), 
          class = "data.frame")

Model

gee::gee(testres ~ agegrp, data = df, 
         id = id, 
         family = binomial, 
         corstr = "exchangeable")

Error

Error in gee::gee(testres ~ agegrp, data = df, id = id, family = binomial, : NA/NaN/Inf in foreign function call (arg 2) In addition: Warning message: In gee::gee(testres ~ agegrp, data = df, id = id, family = binomial, : NAs introduced by coercion

Checking data to ensure no NA, Inf, or character variables - all are factors with no missing data

# All factors
str(df)

  # 'data.frame':   26 obs. of  3 variables:
  # $ id     : Factor w/ 25 levels "ALWA28M","BOMA13M",..: 7 1 20 15 14 6 8 24 21 19 ...
  # $ testres: Factor w/ 2 levels "POS","NEG": 1 1 1 2 1 1 1 1 1 1 ...
  # $ agegrp : Factor w/ 6 levels "0","1","2","3",..: 5 3 3 5 1 1 2 2 1 2 ...

# No NAs or Infinites
lapply(df, table, useNA = "always")
  # 0 NAs
lapply(df, \(x) table(is.infinite(x)))
 # All FALSE

Alternative approach using geepack

geepack::geeglm(testres ~ agegrp,
                data = df, id = id,
                corstr = "exchangeable",
                family = "binomial")

geepack error:

Error in lm.fit(zsca, qlf(pr2), offset = soffset) : NA/NaN/Inf in 'y' In addition: Warning messages: 1: In model.response(mf, "numeric") : using type = "numeric" with a factor response will be ignored 2: In Ops.factor(y, mu) : ‘-’ not meaningful for factors

Changing the correlation structure yields same error. Standard logistic regression converges:

summary(glm(testres ~ agegrp, data = df, family = "binomial"(link = logit)))

SO questions that did not resolve the issue. While this issue is common on the site, in my view there is not a sufficient answer to this question on SO, hence the decision to post.

  1. How to eliminate "NA/NaN/Inf in foreign function call (arg 7)" running predict with randomForest
  2. R: NA/NaN/Inf in foreign function call (arg 1)
  3. Error in fitting a model with gee(): NA/NaN/Inf in foreign function call (arg 3)
  4. NA/NaN/Inf in foreign function call (arg 2)
  5. NA/NaN/Inf in foreign function call (arg 5)
  6. lme: NA/NaN/Inf in foreign function call (arg 3)
  7. NA/NaN/Inf in foreign function call (arg 1) when trying to run a PGLS (Pagel's lambda)
  8. How to eliminate “NA/NaN/Inf in foreign function call (arg 3)” in bigglm
  9. R error in glmnet: NA/NaN/Inf in foreign function call

Solution

  • Using 0 and 1 in testres works:

      df <- structure(list(id = structure(c(7L, 1L, 20L, 15L, 14L, 6L, 8L,  24L, 21L, 19L, 5L, 4L, 18L,
                                    13L, 23L, 16L, 25L, 12L, 10L, 9L,  22L, 17L, 11L, 3L, 2L, 2L),
                                  levels = c("ALWA28M", "BOMA13M", "BOMA41M",  "DAYA35M", "DEMB72M", "EDAB3WM", "EFCH52M",
                                             "FASI6M", "FRRO35M",  "GRAS35F", "GRKA48M", "JARA35M", "KABA27M", "KECH4WM",
                                             "MAAD60M",  "MACH33M", "MEBA29F", "MIGU42M", "MTSA10M", "NTMA22F", "RACA2M",
                                             "STMA35M", "TOKE39M", "TRMA12M", "YOLU29M"), class = "factor"),
                   testres = structure(c(1L, 1L, 1L, 0L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 0L,
                                             1L, 1L, 1L, 1L, 1L, 1L, 1L, 0L, 0L, 1L, 1L, 1L, 1L)),
                   agegrp = structure(c(5L, 3L, 3L, 5L, 1L, 1L, 2L, 2L, 1L, 2L, 6L, 4L, 4L,
                                        3L, 4L, 4L, 3L, 4L, 4L, 4L, 4L, 3L, 5L, 4L, 2L, 2L),
                                      levels = c("0", "1", "2", "3", "4", "5"), class = "factor")),
              row.names = c(NA,  26L),
              class = "data.frame")
    
    gee::gee(testres ~ agegrp, data = df,
             id = id,
             family = binomial,
             corstr = "exchangeable")
    #> Beginning Cgee S-function, @(#) geeformula.q 4.13 98/01/27
    #> running glm to get initial regression estimate
    #>   (Intercept)       agegrp1       agegrp2       agegrp3       agegrp4 
    #>  1.956607e+01 -3.377525e-08 -1.817977e+01 -1.831331e+01 -1.887292e+01 
    #>       agegrp5 
    #> -3.513736e-08
    #> Error in gee::gee(testres ~ agegrp, data = df, id = id, family = binomial, : Cgee: error: logistic model for probability has fitted value very close to 1.
    #> estimates diverging; iteration terminated.
    

    There is now an error because the model has fitted some probabilities very close to 0 or 1, but I think this is an unrelated problem (see the section Details in ?glm).