Search code examples
rlogistic-regressionglm

How to plot logistic binomial regression models with categorical and continuous variables?


I have data with continuous and categorical variables, response variable is 1 or 0:

>

 str(test3)
'data.frame':   690 obs. of  7 variables:
 $ A1 : Factor w/ 3 levels "?","a","b": 3 2 2 3 3 3 3 2 3 3 ...
 $ A2 : num  30.8 58.7 24.5 27.8 20.2 ...
 $ A3 : num  0 4.46 0.5 1.54 5.62 ...
 $ A4 : Factor w/ 4 levels "?","l","u","y": 3 3 3 3 3 3 3 3 4 4 ...
 $ A8 : num  1.25 3.04 1.5 3.75 1.71 ...
 $ A11: int  1 6 0 5 0 0 0 0 0 0 ...
 $ A16: num  1 1 1 1 1 1 1 1 1 1 ...*

What is the way to plot the model? Should I divide categorical and continuous variables? I have tried this:

   mod3 <- glm(A16~., data=credit, family=binomial)
    mod3$coefficients
    summary(mod3)

But I received error:

glm.fit: fitted probabilities numerically 0 or 1 occurred 


head(test3, n=30)
   A1    A2     A3 A4     A8 A11 A16
1   b 30.83  0.000  u  1.250   1   1
2   a 58.67  4.460  u  3.040   6   1
3   a 24.50  0.500  u  1.500   0   1
4   b 27.83  1.540  u  3.750   5   1
5   b 20.17  5.625  u  1.710   0   1
6   b 32.08  4.000  u  2.500   0   1
7   b 33.17  1.040  u  6.500   0   1
8   a 22.92 11.585  u  0.040   0   1
9   b 54.42  0.500  y  3.960   0   1
10  b 42.50  4.915  y  3.165   0   1
11  b 22.08  0.830  u  2.165   0   1
12  b 29.92  1.835  u  4.335   0   1
13  a 38.25  6.000  u  1.000   0   1
14  b 48.08  6.040  u  0.040   0   1
15  a 45.83 10.500  u  5.000   7   1
16  b 36.67  4.415  y  0.250  10   1
17  b 28.25  0.875  u  0.960   3   1
18  a 23.25  5.875  u  3.170  10   1
19  b 21.83  0.250  u  0.665   0   1
20  a 19.17  8.585  u  0.750   7   1
21  b 25.00 11.250  u  2.500  17   1
22  b 23.25  1.000  u  0.835   0   1
23  a 47.75  8.000  u  7.875   6   1
24  a 27.42 14.500  u  3.085   1   1
25  a 41.17  6.500  u  0.500   3   1
26  a 15.83  0.585  u  1.500   2   1
27  a 47.00 13.000  u  5.165   9   1
28  b 56.58 18.500  u 15.000  17   1
29  b 57.42  8.500  u  7.000   3   1
30  b 42.08  1.040  u  5.000   6   1

Solution

  • So absent a look at your full dataset I'm perplexed. I'm suspicious of question marks as factors but none of the other oddities seem to matter. I mocked up a similar data set. Runs fine with or without na.omit.

    Short answer is no you don't have to do anything special to tell it variable types...

    set.seed(2020)
    A1 <- factor(sample(letters[1:3], size = 100,replace = TRUE))
    A2 <- runif(100, min = 20, max = 70)
    A3 <- runif(100, min = 0, max = 10)
    A4 <- factor(sample(c("l", "u", "y", "x"), size = 100,replace = TRUE))
    A8 <- runif(100, min = 0, max = 20)
    A11 <- sample(0:20, size = 100, replace = TRUE)
    A16 <- as.numeric(sample(0:1, size = 100, replace = TRUE, prob = c(.1, .9)))
    credit <- data.frame(A1, A2, A3, A4, A8, A11, A16)
    str(credit)
    #> 'data.frame':    100 obs. of  7 variables:
    #>  $ A1 : Factor w/ 3 levels "a","b","c": 3 2 1 1 2 2 1 1 2 2 ...
    #>  $ A2 : num  38.8 54.1 29.1 23.3 32 ...
    #>  $ A3 : num  0.118 2.288 0.986 3.363 5.745 ...
    #>  $ A4 : Factor w/ 4 levels "l","u","x","y": 4 2 2 2 2 3 2 2 2 3 ...
    #>  $ A8 : num  8.85 17.94 4.42 2.88 14.77 ...
    #>  $ A11: int  4 2 13 2 20 18 20 20 9 18 ...
    #>  $ A16: num  1 1 1 1 1 1 0 1 1 1 ...
    mod3 <- glm(A16~., data=credit, family=binomial, na.action = na.omit)
    mod3
    #> 
    #> Call:  glm(formula = A16 ~ ., family = binomial, data = credit, na.action = na.omit)
    #> 
    #> Coefficients:
    #> (Intercept)          A1b          A1c           A2           A3          A4u  
    #>     0.37850     -0.49031     -0.52429      0.02990      0.07271      1.08706  
    #>         A4x          A4y           A8          A11  
    #>     1.05172      0.38511     -0.00192     -0.02511  
    #> 
    #> Degrees of Freedom: 99 Total (i.e. Null);  90 Residual
    #> Null Deviance:       69.3 
    #> Residual Deviance: 65.55     AIC: 85.55