My training data looks like this:
A B C D
1 1 1 1
1 1 1 2
1 1 2 1
1 1 2 1
1 1 2 2
1 1 2 2
1 2 1 1
1 2 1 1
1 2 1 2
1 2 1 2
1 2 2 1
1 2 2 2
2 1 1 1
2 1 1 1
2 1 1 2
2 1 1 2
2 1 2 1
2 1 2 1
2 1 2 2
2 1 2 2
2 2 1 1
2 2 1 2
2 2 2 1
2 2 2 2
2 2 2 2
And my test data:
A B C D
1 1 2 1
1 1 2 2
1 1 1 1
2 1 2 2
I did fitting using:
dag <- model2network("[A][B][C|A:B][D|A:B:C]")
training <- bn.fit(dag, trainingData, method = "mle", keep.fitted = TRUE)
And I am trying to predict values for column D using:
predicted = predict(training, node = "D", data = testData, method = "parents", prob = FALSE)
But I get the error
Error in check.data(data, allow.levels = TRUE) : variable B must have at least two levels.
How do I fix this? I was of the view that test data does not need to have all levels that are included in the training data - in fact, shouldn't it be possible to predict even if the test data has one instance only?
Since your variables are all encoded as factor
they "have" a list of factor levels. When you create training
you have 1
and 2
in column B, and the factor levels are (implicitly, in the background) set to c(1
, 2
). But when you create testData
you only have 1
in column B, and the factor levels are (implicitly, in the background) set to only 1
.
We can fix this by explicitly stating that testData$B
has the levels c(1, 2)
even though only 1
appears in the data.
testData$B <- factor(testData$B, levels=c(1, 2))
Fixed the stupid mistake where I wrote training
while I totally intended to write testData