What does NA in odds ratio mean?

I am currently working on landing page testing with both independent and dependent variables as logical variables. I wanted to check which of these variables, if true, is a major factor for a conversion.

So basically we are testing multiple variations of a single variable. For example, we have three different images, if image 1 is true for one row, the other two variables are false.

I used Logistic regression to conduct this test. When I looked at the odds ratio output, I ended up having a lot of NAs. I am not sure how to interpret them and how to rectify them.

Below is the sample dataset. The actual data has 18000+ rows.

classifier1 <- glm(formula = Target ~ .,
              family = binomial,
              data = Dataset)

This is the output.

Does this mean I need more data? Is there some other way to conduct multivariate landing page testing?

Solution

It looks like two or more of your variables (columns) are perfectly correlated. Try to remove several columns.

You can see it at the toy data.frame with the random content:

n <- 20
y <- matrix(sample(c(TRUE, FALSE), 5 * n, replace = TRUE), ncol = 5)
colnames(y) <- letters[1:5]
z <- as.data.frame(y)
z$target <- rep(0:1, 2 * n)[1:nrow(z)]
m <- glm(target ~ ., data = z, family = binomial)
summary(m)

At the summary you can see that everything is OK.

Call:
glm(formula = target ~ ., family = binomial, data = z)

Deviance Residuals: 
     Min        1Q    Median        3Q       Max  
-1.89808  -0.48166  -0.00004   0.64134   1.89222  

Coefficients:
             Estimate Std. Error z value Pr(>|z|)  
(Intercept)  -22.3679  4700.1462  -0.005   0.9962  
aTRUE          3.2286     1.6601   1.945   0.0518 .
bTRUE         20.2584  4700.1459   0.004   0.9966  
cTRUE          0.7928     1.3743   0.577   0.5640  
dTRUE         17.0438  4700.1460   0.004   0.9971  
eTRUE          2.9238     1.6658   1.755   0.0792 .
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 27.726  on 19  degrees of freedom
Residual deviance: 14.867  on 14  degrees of freedom
AIC: 26.867

Number of Fisher Scoring iterations: 18

But if you make two columns perfectly correlated as below, and then make generalized linear model:

z$a <- z$b
m <- glm(target ~ ., data = z, family = binomial)
summary(m)

you can observe NAs as below

Call:
glm(formula = target ~ ., family = binomial, data = z)

Deviance Residuals: 
     Min        1Q    Median        3Q       Max  
-1.66621  -1.01173   0.00001   1.06907   1.39309  

Coefficients: (1 not defined because of singularities)
             Estimate Std. Error z value Pr(>|z|)
(Intercept)  -18.8718  3243.8340  -0.006    0.995
aTRUE         18.7777  3243.8339   0.006    0.995
bTRUE              NA         NA      NA       NA
cTRUE          0.3544     1.0775   0.329    0.742
dTRUE         17.1826  3243.8340   0.005    0.996
eTRUE          1.1952     1.2788   0.935    0.350

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 27.726  on 19  degrees of freedom
Residual deviance: 19.996  on 15  degrees of freedom
AIC: 29.996

Number of Fisher Scoring iterations: 17