So I have some data that is structured similarly to the following:
| Works | DoesNotWork |
-----------------------
Unmarried| 130 | 235 |
Married | 10 | 95 |
I'm trying to use logistic regression to predict Work Status
from the Marriage Status
, however I don't think I understand how to in R. For example, if my data looks like the following:
MarriageStatus | WorkStatus|
-----------------------------
Married | No |
Married | No |
Married | Yes |
Unmarried | No |
Unmarried | Yes |
Unmarried | Yes |
I understand that I could do the following:
log_model <- glm(WorkStatus ~ MarriageStatus, data=MarriageDF, family=binomial(logit))
When the data is summarized, I just don't understand how to do this. Do I need to expand the data into a non-summarized form and encode Married/Unmarried
as 0/1
and do the same for Working/Not Working
and encode it as 0/1
? .
Given only the first summary DF, how would I write the logistic regression glm
function? Something like this?
log_summary_model <- glm(Works ~ DoesNotWork, data=summaryDF, family=binomial(logit))
But that doesn't make sense as I'm splitting the response dependent variable?
I'm not sure if I'm over complicating this, any help would be greatly appreciated , thanks!
You need to expand the contingency table into a data frame then a logit model can be calculated using the frequency count as a weight variable:
mod <- glm(works ~ marriage, df, family = binomial, weights = freq)
summary(mod)
Call:
glm(formula = works ~ marriage, family = binomial, data = df,
weights = freq)
Deviance Residuals:
1 2 3 4
16.383 6.858 -14.386 -4.361
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.5921 0.1093 -5.416 6.08e-08 ***
marriage -1.6592 0.3500 -4.741 2.12e-06 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 572.51 on 3 degrees of freedom
Residual deviance: 541.40 on 2 degrees of freedom
AIC: 545.4
Number of Fisher Scoring iterations: 5
Data:
df <- read.table(text = "works marriage freq
1 0 130
1 1 10
0 0 235
0 1 95", header = TRUE)