Search code examples
rprecisionh2oauc

How do I evaluate a multinomial classification model in R?


i am currently trying to build a muti-class prediction model to predict the letter out of 26 English alphabets. I have currently built a few models using ANN, SVM, Ensemble and nB. But i am stuck at the evaluating the accuracy of these models. Although the confusion matrix shows me the Alphabet-wise True and False predictions, I am only able to get an overall accuracy of each model. Is there a way to evaluate the model's accuracy similar to the ROC and AUC values for a Binomial Classification. Note: I am currently running the model using the H2o package as it saves me more time.


Solution

  • Once you train a model in H2O, if you simply do: print(fit) it will show you all the available metrics for that model type. For multiclass, I'd recommend h2o.mean_per_class_error().

    R code example on the iris dataset:

    library(h2o)
    h2o.init(nthreads = -1)
    
    data(iris)
    fit <- h2o.naiveBayes(x = 1:4, 
                          y = 5, 
                          training_frame = as.h2o(iris), 
                          nfolds = 5)
    

    Once you have the model, we can evaluate model performance using the h2o.performance() function to view all the metrics:

    > h2o.performance(fit, xval = TRUE)
    H2OMultinomialMetrics: naivebayes
    ** Reported on cross-validation data. **
    ** 5-fold cross-validation on training data (Metrics computed for combined holdout predictions) **
    
    Cross-Validation Set Metrics: 
    =====================
    
    Extract cross-validation frame with `h2o.getFrame("iris")`
    MSE: (Extract with `h2o.mse`) 0.03582724
    RMSE: (Extract with `h2o.rmse`) 0.1892808
    Logloss: (Extract with `h2o.logloss`) 0.1321609
    Mean Per-Class Error: 0.04666667
    Hit Ratio Table: Extract with `h2o.hit_ratio_table(<model>,xval = TRUE)`
    =======================================================================
    Top-3 Hit Ratios: 
      k hit_ratio
    1 1  0.953333
    2 2  1.000000
    3 3  1.000000
    

    Or you can look at a particular metric, like mean_per_class_error:

    > h2o.mean_per_class_error(fit, xval = TRUE)
    [1] 0.04666667
    

    If you want to view performance on a test set, then you can do the following:

    perf <- h2o.performance(fit, test)
    h2o.mean_per_class_error(perf)