I have scoured the forum far and wide and found many articles like this, however, none that solved my issue.
Now, I turn to you.
I have data similar to this:
ontime currency incoterms price month
1 USD FOB 234.2 01
1 CAD FOB 92.4 01
0 USD DAP 238.9 02
0 EUR FOB 100 03
1 CNY DAP 739.8 04
I this code:
g = df$ontime #binary
a = df$currency #String
b = df$INCOTERMS #String
c = df$price #float
f = df$month #string
mod1 <- glm(g~a+b+c,family=binomial(link="logit"), data=df[f=="01",])
pred_ontime1 <- predict(mod1,df[f%in%c("02","03","04"),],type="response")
My desire is to test my model, that I trained on data from month 01, on month 02, 03 and 04.
My outcome, however is this:
Warning message:
'newdata' had 16623 rows but variables found have 22488 rows
I have tried training on month 01 and testing on 01,02,03 and 04, which did not give me the error message, however, it seems inappropriate to test on data included in my training set.
The value 16623 is of course the combined number of rows in 02, 03 and 04, while 22488 is the combined number of rows in 01, 02, 03 and 04.
What can I do?
Try running the model without saving each column to a vector first. I think predict()
can't tell that it is the same variable names as it modeled on.
mod1 <- glm(ontime ~ currency + INCOTERMS + price, family = binomial(link = "logit"), data = df[df$month == "01",])
pred_ontime1 <- predict(mod1,df[df$month %in% c("02","03","04"),], type = "response")
See if that works.
Here is a reproducible example for anyone interested:
df <- read.table(textConnection("ontime currency incoterms price month
0 USD DAP 234.2 01
1 CAD FOB 92.4 01
0 USD DAP 238.9 02
0 USD FOB 100 03
1 CAD DAP 739.8 04"), header = TRUE)
mod1 <- glm(ontime ~ currency + incoterms + price, family = binomial(link = "logit"), data = df[df$month == 1,])
pred_ontime1 <- predict(mod1, df[df$month %in% c(2:4),], type = "response")
pred_ontime1
3 4 5
5.826215e-11 5.826215e-11 1.000000e+00