Search code examples
machine-learningnaivebayesaccord.net

Naive Bayes - no samples for class label 0


Not long ago I asked a question about the Accord.net Naive Bayes algorithm throwing an error. It turned out that this was due to me using Discrete value input columns but not giving enough training data for all the values I had listed for the column.

Now I am getting the exact same error, only this time it is being triggered only when I use a Continuous value for my output column. Particularly an output column of integer data type. Because it is an integer, the Codification class is not translating it so the values get passed directly into the Naive Bayes algorithm, and the algorithm apparently cannot handle that.

If I manually change the column data type to a string and send it through the Codification class to get codified then send the results of that through the algorithm it works correctly.

Is there any particular reason why this algorithm can't handle Continuous data types as outputs? Is there some setting I need to enable to make this work?

Some sample code:

        DataTable symbols = TrainingCodebook.Apply(DataTraining, AllAttributeNames);
        double[][] inputs = symbols.ToJagged<double>(KeptAttributeNames.ToArray());
        // *** The line that is breaking ***
        int[] outputs = symbols.ToArray<int>(outputCol);

        // *** The replacement test code that does work ***
        // DataStringTraining is the same as DataTraining, but all values are strings
        //Codification codeee = new Codification(DataStringTraining, outputCol);
        //var sym = codeee.Apply(DataStringTraining, outputCol);
        //int[] outputs = sym.ToArray<int>(outputCol);

        /*
         * Create a new instance of the learning algorithm
         * and build the algorithm
         */
        var learner = new NaiveBayesLearning<IUnivariateFittableDistribution>()
        {
            // Tell the learner how to initialize the distributions
            Distribution = (classIndex, variableIndex) => attributList[variableIndex],
        };

        NaiveBayes<IUnivariateFittableDistribution> alg = null;
        try
        {
            ProgPerformStep("Computing and training algorithm");
            alg = learner.Learn(inputs, outputs);
        }
        catch (Exception ex)
        {
            ProgPerformStep($"ERROR: Naive Bayes: {ex.Message}", ex);
            return;
        }

Solution

  • I don't have a great answer for this, however what I believe is occurring is that the algorithm I am using is listed on the accord.net site as a Classification algorithm.

    Based on some reading here, my belief is that classification algorithms are not capable of handling continuous output values.

    I probably need to switch to using a regression algorithm to gain that particular functionality.

    In light of that, the solution for this algorithm is to manually codify the output column, or convert it to a string first so the Codification library will do the job for me.