I'm trying to use the nnet
library to create a multinomial logistic regression model from my training data to see if I can use it to predict my test data.
I set everything up in R using this script:
library(nnet)
folder <- "C:/***/"
trainingfile <- "training-set.txt"
testfile <- "test-set.txt"
train <- read.table(paste(folder, trainingfile, sep=''), sep=",", header=FALSE)
train.classes <- t(train[1:1])
train.data <- train[2:16]
test <- read.table(paste(folder, testfile, sep=''), sep=",", header=FALSE)
test.classes <- t(test[1:1])
test.data <- test[2:16]
train.model <- multinom(V1 ~ ., train, maxit=450) #converges after roughly 430 iterations
This all works well and the function multinom
reports convergence.
To use the model to predict to classify the test data I use:
predictions <- predict(train.model, test.data)
However I'm then greeted with the error Error in eval(expr, envir, enclos) : object 'V17' not found
. However when I inspect train.model
I see that there is indeed an object 'V17'
> train.model
Call:
multinom(formula = V1 ~ ., data = train, maxit = 450)
Coefficients:
(Intercept) V2
B -12.9514837 1.0668464
C -48.1154774 1.6160071
D -2.2901219 1.0062945
E -39.4371326 0.6848848
F -20.6759707 0.8613838
G -21.4471217 1.2858480
H -17.4302527 0.8102932
I -4.7391825 1.3124087
J -12.3513130 1.1404751
K -13.9557738 0.7574471
L -0.4915034 0.7191369
M -14.0855382 0.8888810
N -0.4372225 0.6041747
O -18.2596753 1.2708861
P -9.8504326 1.2672870
Q -20.9940977 1.8104502
R -5.8030089 0.8677690
S -12.9944084 0.8097735
T -32.5636344 1.8977861
U -9.1752184 1.6059663
V -13.5695897 1.4547335
W -6.2590220 1.1292715
X -4.5939135 0.7603754
Y -15.6763068 1.6498374
Z -37.1840564 0.7382329
*SNIP*
V17
1.63319426
1.93093207
0.80392847
1.79189803
1.32248565
1.72440154
1.22022835
1.03014847
0.20977345
2.40335443
1.17253978
0.65072776
0.46675729
1.16579165
1.50787334
1.41267773
1.71666099
0.72543894
0.64857852
0.32401569
1.33290027
0.83846524
1.02863203
-0.05005955
0.13792242
Residual Deviance: 26196.1
AIC: 27046.1
This is very strange, I now have no clue why the error is occurring. Anyway to get more data I tried calling summary(train.model)
but that just totally hangs R forever. I've tried both the 32b and 64b versions of R 2.15.2 (the latest stable version) and the result is the same. Does anybody have a clue how I can resolve the errors/hangs and how I can rightly predict using the model created by multinom
?
Summarizing from comments above:
Ensure that the following is true:
all(names(train)[-1] %in% names(test.data)) # [-1] to ignore V1
Otherwise, predict
will throw an error.
And to add a little bit of value: In my experience, the reason for summary.multinom
taking such a long time is that vcov.multinom
is being called and the Hessian is being calculated. If you're making multiple calls to summary(train.model)
, it would make sense to calculate the Hessian in the call to multinom
(which may still take a while):
train.model <- multinom(V1 ~ ., train, maxit=450, Hess = TRUE)