Simple question here: I have the following data and I need to get it in a format where I can run a logistic regression on it.
pvp <- rep(c("lib", "mod", "con"), 3)
pres <- c(rep("Bush", 3), rep("Clinton", 3), rep("Perot", 3))
count <- c(70, 195, 382, 324, 332, 199, 56, 101, 117)
df <- as.data.frame(cbind(pvp, pres, count))
df$pres <- recode(df$pres, 'Clinton' = '1', 'Bush' = '0', 'Perot' = '0')
df$count <- as.numeric(as.character(df$count))
It looks like this:
> df
pvp pres count
1 lib 0 70
2 mod 0 195
3 con 0 382
4 lib 1 324
5 mod 1 332
6 con 1 199
7 lib 0 56
8 mod 0 101
9 con 0 117
I need to run a logistic regression predicting pres from pvp. Normally I think I would just use spread
from tidyverse to get the data into a wide format. But here I have an issue with using key = pvp in that spread function. I can't collapse the categories either because some of them obviously correspond with pres = 1 and some with pres = 0. What solution can I use to get the data in a format where I can run a logistic regression on it?
Thanks in advance.
There is no need to expand the data, you can use the "weight" parameter while training the model.
model_logit <- glm(pres ~ pvp, family="binomial", weight = df$count, data = df)
predictions <- predict(model_logit, data.frame(pvp=unique(df$pvp)), type="response")