Search code examples
rdeep-learningh2o

h2o predict error: Test/Validation dataset has no columns in common with the training set


I'm attempting to implement the deeplearning function in the h2o package and obtain a persistent error despite (seemingly) following the example given in the documentation for the package:

https://www.rdocumentation.org/packages/h2o/versions/3.44.0.3/topics/h2o.deeplearning

My inputs are y, a length-n vector of 0,1 indicating binary outcomes, and an nxm matrix of integers x which are my predictor variables.

library(h2o)
h2o.init()

y = as.factor(y)
xnew = cbind(y,x)
xnew = data.frame(xnew)
#create a single data frame with response variable y and predictor variables x
x_df = as.h2o(xnew)
# formats dataframe as h2o data object
nn_model_training = h2o.deeplearning(y=1,training_frame = x_df)

which executes without issue. I now wish to use the nn_model_training to predict outcomes for a test set. To have the same column names as in the model, I take 2...m from the training_frame (i.e. exclude outcome variable y):

keep_names = names(x_df)[2:length(x_df[1,])]
x_new = Test_Mat
x_new = data.frame(x_new)
names(x_new) = keep_names
x_df_new = as.h2o(x_new)

nn_predicted = h2o.predict(nn_model_training, x_df_new)

which immediately results in the error

*Error: java.lang.IllegalArgumentException: Test/Validation dataset has no columns in common with the training set*

despite the fact that I renamed the columns to match the names of the x-variables in the training set.

What am I doing incorrectly when implementing the h2o.predict() ?


Solution

  • I couldn't reproduce the issue but I think it might be caused by having different types - if the training frame has factors/enums and test frame integer values then it could cause the first frame to encode so the names could end up something like X1.1 instead of X1.

    Could you try attributes(x_df_new)[["types"]] and attributes(x_df)[["types"]] and then verify that the types are the same?

    You can also use nn_model_training@model$names to find out what names the model used.

    Also I would suggest you make sure that the y is still a factor after the conversion to h2o frame.

    You can convert columns to factors in h2o using: x_df$y <- h2o.asfactor(x_df$y)