I am currently working on landing page testing with both independent and dependent variables as logical variables. I wanted to check which of these variables, if true, is a major factor for a conversion.
So basically we are testing multiple variations of a single variable. For example, we have three different images, if image 1 is true for one row, the other two variables are false.
I used Logistic regression to conduct this test. When I looked at the odds ratio output, I ended up having a lot of NAs. I am not sure how to interpret them and how to rectify them.
Below is the sample dataset. The actual data has 18000+ rows.
classifier1 <- glm(formula = Target ~ .,
family = binomial,
data = Dataset)
This is the output.
Does this mean I need more data? Is there some other way to conduct multivariate landing page testing?
It looks like two or more of your variables (columns) are perfectly correlated. Try to remove several columns.
You can see it at the toy data.frame
with the random content:
n <- 20
y <- matrix(sample(c(TRUE, FALSE), 5 * n, replace = TRUE), ncol = 5)
colnames(y) <- letters[1:5]
z <- as.data.frame(y)
z$target <- rep(0:1, 2 * n)[1:nrow(z)]
m <- glm(target ~ ., data = z, family = binomial)
summary(m)
At the summary you can see that everything is OK.
Call:
glm(formula = target ~ ., family = binomial, data = z)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.89808 -0.48166 -0.00004 0.64134 1.89222
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -22.3679 4700.1462 -0.005 0.9962
aTRUE 3.2286 1.6601 1.945 0.0518 .
bTRUE 20.2584 4700.1459 0.004 0.9966
cTRUE 0.7928 1.3743 0.577 0.5640
dTRUE 17.0438 4700.1460 0.004 0.9971
eTRUE 2.9238 1.6658 1.755 0.0792 .
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 27.726 on 19 degrees of freedom
Residual deviance: 14.867 on 14 degrees of freedom
AIC: 26.867
Number of Fisher Scoring iterations: 18
But if you make two columns perfectly correlated as below, and then make generalized linear model:
z$a <- z$b
m <- glm(target ~ ., data = z, family = binomial)
summary(m)
you can observe NAs as below
Call:
glm(formula = target ~ ., family = binomial, data = z)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.66621 -1.01173 0.00001 1.06907 1.39309
Coefficients: (1 not defined because of singularities)
Estimate Std. Error z value Pr(>|z|)
(Intercept) -18.8718 3243.8340 -0.006 0.995
aTRUE 18.7777 3243.8339 0.006 0.995
bTRUE NA NA NA NA
cTRUE 0.3544 1.0775 0.329 0.742
dTRUE 17.1826 3243.8340 0.005 0.996
eTRUE 1.1952 1.2788 0.935 0.350
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 27.726 on 19 degrees of freedom
Residual deviance: 19.996 on 15 degrees of freedom
AIC: 29.996
Number of Fisher Scoring iterations: 17