Search code examples
javajava-8regressionapache-commonsapache-commons-math

Calculate R-Square for PolynomialCurveFitter in Apache commons-math3


Apache commons-math3 (version 3.6.1) classes like OLSMultipleLinearRegression, SimpleRegression provide a method that calculates RSquare (i.e calculateRSquared(), getRSquare() respectively). But I am not able to find any such method for PolynomialCurveFitter ?

Right now I am doing it myself like below. Is there any such method in common-math which does this?

private PolynomialFunction getPolynomialFitter(List<List<Double>> pointlist) {
    final PolynomialCurveFitter fitter = PolynomialCurveFitter.create(2);
    final WeightedObservedPoints obs = new WeightedObservedPoints();
    for (List<Double> point : pointlist) {
        obs.add(point.get(0), point.get(1));
    }

    double[] fit = fitter.fit(obs.toList());
    System.out.printf("\nCoefficient %f, %f, %f", fit[0], fit[1], fit[2]);
    final PolynomialFunction fitted = new PolynomialFunction(fit);
    return fitted;
}
private double getRSquare(PolynomialFunction fitter, List<List<Double>> pointList) {
    final double[] coefficients = fitter.getCoefficients();
    double[] predictedValues = new double[pointList.size()];
    double residualSumOfSquares = 0;
    final DescriptiveStatistics descriptiveStatistics = new DescriptiveStatistics();
    for (int i=0; i< pointList.size(); i++) {
        predictedValues[i] = predict(coefficients, pointList.get(i).get(0));
        double actualVal = pointList.get(i).get(1);
        double t = Math.pow((predictedValues[i] - actualVal), 2);
        residualSumOfSquares  += t;
        descriptiveStatistics.addValue(actualVal);
    }
    final double avgActualValues = descriptiveStatistics.getMean();
    double totalSumOfSquares = 0;
    for (int i=0; i<pointList.size(); i++) {
        totalSumOfSquares += Math.pow( (predictedValues[i] - avgActualValues),2);
    }
    return 1.0 - (residualSumOfSquares/totalSumOfSquares);
}
final PolynomialFunction polynomial = getPolynomialFitter(trainData);
System.out.printf("\nPolynimailCurveFitter R-Square %f", getRSquare(polynomial, trainData));

Solution

  • This has been answered in apache-commons mailing list. Cross-posting the answer

    OLSMultipleLinearRegression, SimpleRegression provide a method that returns calculateRSquared(), getRSquare(). But I am not able to find any such method for PolynomialCurveFitter ?

    Right now I am doing it myself like below :-

    Is there any such method in common-math which does this?

    "PolynomialCurveFitter" is one of the syntactic sugar/wrapper around the least-squares optimizers. No state is maintained in the (immutable) instance.

    private PolynomialFunction getPolynomialFitter(List<List<Double>>pointlist) {
    
    final PolynomialCurveFitter fitter = PolynomialCurveFitter.create(2);
    
    final WeightedObservedPoints obs = new WeightedObservedPoints();
    for (List<Double> point : pointlist) {
        obs.add(point.get(0), point.get(1));
    }
    
    double[] fit = fitter.fit(obs.toList());
    System.out.printf("\nCoefficient %f, %f, %f", fit[0], fit[1], fit[2]); 
    
    final PolynomialFunction fitted = new PolynomialFunction(fit);
    return fitted;
    }
    

    This is indeed one the intended use-cases.

    private double getRSquare(PolynomialFunction fitter, List<List<Double>> pointList) {
    
    final double[] coefficients = fitter.getCoefficients();
    double[] predictedValues = new double[pointList.size()];
    double residualSumOfSquares = 0;
    final DescriptiveStatistics descriptiveStatistics = new DescriptiveStatistics();
    
    for (int i=0; i< pointList.size(); i++) {
        predictedValues[i] = predict(coefficients, pointList.get(i).get(0));
    
        double actualVal = pointList.get(i).get(1);
        double t = Math.pow((predictedValues[i] - actualVal), 2);
        residualSumOfSquares  += t;
        descriptiveStatistics.addValue(actualVal);
    }
    final double avgActualValues = descriptiveStatistics.getMean();
    double totalSumOfSquares = 0;
    for (int i=0; i<pointList.size(); i++) {
        totalSumOfSquares += Math.pow( (predictedValues[i] - avgActualValues),2);
    
    }
    return 1.0 - (residualSumOfSquares/totalSumOfSquares);
    }
    

    The "predict" method is not shown here, but note that the argument which you called "fitter" in the above, is actually a polynomial function:

    http://commons.apache.org/proper/commons-math/apidocs/org/apache/commons/math4/analysis/polynomials/PolynomialFunction.html

    Hence: predictedValues[i] = fitter.value(pointList.get(i).get(0));

    But otherwise, yes, the caller is responsible for choosing his assessement of the quality of the model.

    You could directly use the least-squares suite of classes; then the "Evaluation" object would allow to retrieve various measures of the fit:

    http://commons.apache.org/proper/commons-math/apidocs/org/apache/commons/math4/fitting/leastsquares/LeastSquaresProblem.Evaluation.html

    However, they might still not be what you are looking for...