Search code examples
javalinear-regressionwekacross-validation

WEKA cross validate linear regression - can I get RMSPE?


Is it possible to get RMSPE after cross validating a model? I see I can easily get RMSE - but what about the Root Mean Square Percentage Error?

Sample code I've put together with WEKA linear regression cross validation:

        // loads data and set class index
        final ArrayList<Attribute> attributes = new ArrayList<>();
        attributes.add(new Attribute("x"));
        attributes.add(new Attribute("y"));

        Instances data = new Instances("name", attributes, 0);
        data.add(new DenseInstance(1d, new double[]{5, 80}));
        // ... add more data

        // -c last
        data.setClassIndex(data.numAttributes() - 1);

        // classifier
        final LinearRegression cls = new LinearRegression();

        // other options
        int seed = 129;
        int folds = 3;

        // randomize data
        Random rand = new Random(seed);
        Instances randData = new Instances(data);
        randData.randomize(rand);
        if (randData.classAttribute().isNominal())
            randData.stratify(folds);

        // perform cross-validation
        Evaluation eval = new Evaluation(data);

        eval.crossValidateModel(cls, data, 3, new Random(seed));

        System.out.println("rootMeanSquaredError " + eval.rootMeanSquaredError());
        System.out.println("rootRelativeSquaredError " + eval.rootRelativeSquaredError());
        System.out.println("rootMeanPriorSquaredError " + eval.rootMeanPriorSquaredError());

        // output evaluation
        System.out.println();
        System.out.println("=== Setup ===");
        System.out.println("Classifier: " + cls.getClass().getName() + " " + Utils.joinOptions(cls.getOptions()));
        System.out.println("Dataset: " + data.relationName());
        System.out.println("Folds: " + folds);
        System.out.println("Seed: " + seed);
        System.out.println();
        System.out.println(eval.toSummaryString("=== " + folds + "-fold Cross-validation ===", true));


        /*

        === Setup ===
        Classifier: weka.classifiers.functions.LinearRegression -S 0 -R 1.0E-8 -num-decimal-places 4
        Dataset: name
        Folds: 3
        Seed: 129

        === 3-fold Cross-validation ===
        Correlation coefficient                  0.6289
        Mean absolute error                      7.5177
        Root mean squared error                  8.262
        Relative absolute error                 85.7748 %
        Root relative squared error             77.9819 %
        Total Number of Instances               15

         */

Solution

  • Weka doesn't compute the RMSPE by default. I've put together a little Weka package that should do the trick for numeric classes (NB: only done limited testing), called rmspe-weka-package.

    After an evaluation run (with that package installed), you should be able to retrieve the statistic as follows:

    Evaluation eval = ... // initialize your evaluation object
    ...                   // perform your evaluation
    double rmspe = eval.getPluginMetric("weka.classifiers.evaluation.RMSPE").getStatistic("RMSPE");