Search code examples
javagarbage-collectionsvmlibsvm

java.lang.OutOfMemoryError: GC overhead limit exceeded when creating data structure for 1 million elements


When I run the code shown below, I get a java.lang.OutOfMemoryError: GC overhead limit exceeded on line 16: svm_node node = new svm_node();. The code is run on an array of ~1 million elements, where each element holds 100 shorts.

// read in a problem (in svmlight format)
private void read(SupportVector[] vectors) throws IOException
{
    int length = vectors.length; // Length of training data
    double[] classification = new double[length]; // This is redundant for our one-class SVM.
    svm_node[][] trainingSet = new svm_node[length][]; // The training set.
    for(int i = 0; i < length; i++)
    {
        classification[i] = 1; // Since classifications are redundant in our setup, they all belong to the same class, 1.

        // each vector. The vector has to be one index longer than the actual vector,
        // because the implementation needs an empty node in the end with index -1.
        svm_node[] vector = new svm_node[vectors[i].getLength() + 1];

        double[] doubles = vectors[i].toDouble(); // The SVM runs on doubles.
        for(int j = 0; j < doubles.length; j++) {
            svm_node node = new svm_node();
            node.index = j;
            node.value = doubles[j];
            vector[j] = node;
        }
        svm_node last = new svm_node();
        last.index = -1;
        vector[vector.length - 1] = last;

        trainingSet[i] = vector;
    }

    svm_problem problem = new svm_problem();
    problem.l = length;
    problem.y = classification;
    problem.x = trainingSet;
}

From the exception, I guess the garbage collector cannot properly sweep up my new svm_nodes, but I am unable to see how I can optimize my object creation, to avoid creating too many new svn_nodes, that sits helpless in the heap.

I cannot change the data structure, as it is what LIBSVM uses as input to its support vector machine.

My question is: Is this error related to the garbage collector not being able to collect my svm_nodes, or am I simply trying to parse a data structure with too many elements?

PS: I already set the heap size to the maximum for my 32bit application (2gb).


Solution

  • I launched the application in a 64bit environment and raised the heap to more than 2gb which solved the problem. I still believe there's a weird GC quirk, but I was unable to find it, and increasing the heap also solved the problem.