Search code examples
rregressionlogistic-regression

How do i fix the missing output (NA) in my summary/coefficients table R


I was building a logistic regression model in r but when I checked the coefficients using summary(model) the output displayed NA's in the four columns (estimate, standard error, z value and z) for one of my independent variables. My other three variables worked fine.

I also checked for any null values but there were none. I changed it between a continuous and discrete value using as.numeric and as.integer but it still comes out as NA in the output. The variable itself measures total volume of blood donated.

I can't figure this out and it is bothering me. Thanks


Solution

  • Here is an example elaborating on the comment I made above; I'm using a simple linear model here, but the same principle applies for your logistic regression model.

    1. Let's generate some data: We generate data for a model y = x1 + x2 + epsilon, where the two predictor variables x1 and x2 are linearly dependent: x2 = 2.5 * x1.

      # Generate sample data
      set.seed(2017);
      x1 <- seq(1, 100);
      x2 <- 2.5 * x1;
      y <- x1 + x2 + rnorm(100);
      
    2. We fit the model.

      df <- cbind.data.frame(x1 = x1, x2 = x2, y = y);
      fit <- lm(y ~ x1 + x2, df);
      
    3. Look at parameter estimates.

      summary(fit);
      #
      #Call:
      #lm(formula = y ~ x1 + x2, data = df)
      #
      #Residuals:
      #     Min       1Q   Median       3Q      Max
      #-2.50288 -0.75360 -0.01388  0.67935  3.08515
      #
      #Coefficients: (1 not defined because of singularities)
      #            Estimate Std. Error t value Pr(>|t|)
      #(Intercept) 0.166567   0.215534   0.773    0.441
      #x1          3.496831   0.003705 943.719   <2e-16 ***
      #x2                NA         NA      NA       NA
      #---
      #Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
      #
      #Residual standard error: 1.07 on 98 degrees of freedom
      #Multiple R-squared:  0.9999,   Adjusted R-squared:  0.9999
      #F-statistic: 8.906e+05 on 1 and 98 DF,  p-value: < 2.2e-16
      

    You can see that estimates for x2 are NA. This is a direct consequence of x1 and x2 being linearly dependent. In other words, x2 is redundant, and the data can be described by the estimated linear model y = 3.4968 * x1 + epsilon; this is obviously in good agreement with the theoretical coefficient x1 + 2.5 * x1 = 3.5 * x1.