I have data with continuous and categorical variables, response variable is 1 or 0:
>
str(test3)
'data.frame': 690 obs. of 7 variables:
$ A1 : Factor w/ 3 levels "?","a","b": 3 2 2 3 3 3 3 2 3 3 ...
$ A2 : num 30.8 58.7 24.5 27.8 20.2 ...
$ A3 : num 0 4.46 0.5 1.54 5.62 ...
$ A4 : Factor w/ 4 levels "?","l","u","y": 3 3 3 3 3 3 3 3 4 4 ...
$ A8 : num 1.25 3.04 1.5 3.75 1.71 ...
$ A11: int 1 6 0 5 0 0 0 0 0 0 ...
$ A16: num 1 1 1 1 1 1 1 1 1 1 ...*
What is the way to plot the model? Should I divide categorical and continuous variables? I have tried this:
mod3 <- glm(A16~., data=credit, family=binomial)
mod3$coefficients
summary(mod3)
But I received error:
glm.fit: fitted probabilities numerically 0 or 1 occurred
head(test3, n=30)
A1 A2 A3 A4 A8 A11 A16
1 b 30.83 0.000 u 1.250 1 1
2 a 58.67 4.460 u 3.040 6 1
3 a 24.50 0.500 u 1.500 0 1
4 b 27.83 1.540 u 3.750 5 1
5 b 20.17 5.625 u 1.710 0 1
6 b 32.08 4.000 u 2.500 0 1
7 b 33.17 1.040 u 6.500 0 1
8 a 22.92 11.585 u 0.040 0 1
9 b 54.42 0.500 y 3.960 0 1
10 b 42.50 4.915 y 3.165 0 1
11 b 22.08 0.830 u 2.165 0 1
12 b 29.92 1.835 u 4.335 0 1
13 a 38.25 6.000 u 1.000 0 1
14 b 48.08 6.040 u 0.040 0 1
15 a 45.83 10.500 u 5.000 7 1
16 b 36.67 4.415 y 0.250 10 1
17 b 28.25 0.875 u 0.960 3 1
18 a 23.25 5.875 u 3.170 10 1
19 b 21.83 0.250 u 0.665 0 1
20 a 19.17 8.585 u 0.750 7 1
21 b 25.00 11.250 u 2.500 17 1
22 b 23.25 1.000 u 0.835 0 1
23 a 47.75 8.000 u 7.875 6 1
24 a 27.42 14.500 u 3.085 1 1
25 a 41.17 6.500 u 0.500 3 1
26 a 15.83 0.585 u 1.500 2 1
27 a 47.00 13.000 u 5.165 9 1
28 b 56.58 18.500 u 15.000 17 1
29 b 57.42 8.500 u 7.000 3 1
30 b 42.08 1.040 u 5.000 6 1
So absent a look at your full dataset I'm perplexed. I'm suspicious of question marks as factors but none of the other oddities seem to matter. I mocked up a similar data set. Runs fine with or without na.omit.
Short answer is no you don't have to do anything special to tell it variable types...
set.seed(2020)
A1 <- factor(sample(letters[1:3], size = 100,replace = TRUE))
A2 <- runif(100, min = 20, max = 70)
A3 <- runif(100, min = 0, max = 10)
A4 <- factor(sample(c("l", "u", "y", "x"), size = 100,replace = TRUE))
A8 <- runif(100, min = 0, max = 20)
A11 <- sample(0:20, size = 100, replace = TRUE)
A16 <- as.numeric(sample(0:1, size = 100, replace = TRUE, prob = c(.1, .9)))
credit <- data.frame(A1, A2, A3, A4, A8, A11, A16)
str(credit)
#> 'data.frame': 100 obs. of 7 variables:
#> $ A1 : Factor w/ 3 levels "a","b","c": 3 2 1 1 2 2 1 1 2 2 ...
#> $ A2 : num 38.8 54.1 29.1 23.3 32 ...
#> $ A3 : num 0.118 2.288 0.986 3.363 5.745 ...
#> $ A4 : Factor w/ 4 levels "l","u","x","y": 4 2 2 2 2 3 2 2 2 3 ...
#> $ A8 : num 8.85 17.94 4.42 2.88 14.77 ...
#> $ A11: int 4 2 13 2 20 18 20 20 9 18 ...
#> $ A16: num 1 1 1 1 1 1 0 1 1 1 ...
mod3 <- glm(A16~., data=credit, family=binomial, na.action = na.omit)
mod3
#>
#> Call: glm(formula = A16 ~ ., family = binomial, data = credit, na.action = na.omit)
#>
#> Coefficients:
#> (Intercept) A1b A1c A2 A3 A4u
#> 0.37850 -0.49031 -0.52429 0.02990 0.07271 1.08706
#> A4x A4y A8 A11
#> 1.05172 0.38511 -0.00192 -0.02511
#>
#> Degrees of Freedom: 99 Total (i.e. Null); 90 Residual
#> Null Deviance: 69.3
#> Residual Deviance: 65.55 AIC: 85.55