Search code examples
javacsvnetbeansjfreechartswingworker

Parse csv file and store result as JFree chart dataset consumes heap space


I have a Netbeans Module Application that runs ok when executed within my Netbeans IDE. But when I run the distribution executable from the generated unzipped folder, the application program swing worker task will stop after a while. It loops thru couple of files and then stops. My best guess is that I have to do something about the loop where I process the csv files? Or ... any ide or hint would be most appreciated The size of the files is 2000 - 600.000 rows and contains 5 time series that are defined as double. I store the datasets in collection.

Here is my method with the while loop

protected XYDataset generateDataSet(String filePath) {

    TimeSeriesCollection dataset = null;
    try {
        dataset = new TimeSeriesCollection();

        boolean isHeaderSet = false;

        String fileRow;
        StringTokenizer tokenizer;
        BufferedReader br;
        List<String> headers;
        String encoding = "UTF-8";
        br = new BufferedReader(new InputStreamReader(new FileInputStream(filePath), encoding));

        //br = new BufferedReader(new FileReader(filePath));
        if (!br.ready()) {
            throw new FileNotFoundException();
        }
        fileRow = br.readLine();

Loop starts here

        while (fileRow != null) {

            if (!isHeaderSet) {
                headers = getHeaders(fileRow);
                for (String string : headers) {
                    dataset.addSeries(new TimeSeries(string));
                }
                isHeaderSet = true;
            }
            if (fileRow.startsWith("#")) {
                fileRow = br.readLine();
            }
            String timeStamp = null;
            String theTok1 = null;
            String theTok2;
            tokenizer = new StringTokenizer(fileRow);
            if (tokenizer.hasMoreTokens()) {
                theTok1 = tokenizer.nextToken().trim();
            }
            if (tokenizer.hasMoreTokens()) {
                theTok2 = tokenizer.nextToken().trim();
                timeStamp = theTok1 + " " + theTok2;
            }

            Millisecond m = null;

            if (timeStamp != null) {
                m = getMillisecond(timeStamp);
            }

            int serieNumber = 0;
            br.mark(201);
            if (br.readLine() == null) {
                br.reset();
                while (tokenizer.hasMoreTokens()) {
                    if (dataset.getSeriesCount() > serieNumber) {
                        dataset.getSeries(serieNumber).add(m, parseDouble(tokenizer.nextToken().trim()), true);

Last code row abowe, I set notifyer to true on the very last scv file row otherwise it will loop throue the data set every time I add a new serie and its enough to do that on the last row.

                    } else {
                        tokenizer.nextToken();
                    }
                    serieNumber++;
                }
            } else {
                br.reset();
                while (tokenizer.hasMoreTokens()) {
                    if (dataset.getSeriesCount() > serieNumber) {
                        dataset.getSeries(serieNumber).add(m, parseDouble(tokenizer.nextToken().trim()), false);

                    } else {
                        tokenizer.nextToken();
                    }
                    serieNumber++;
                }
            }
            fileRow = br.readLine();
        }
        br.close();
    } catch (FileNotFoundException ex) {
        printStackTrace(ex);
    } catch (IOException | ParseException ex) {
        printStackTrace(ex);
    }
    return dataset;
}

Here is also methods I use when processing heders and timestamp called from the code above. (sometimes the csv file misses headers)

/**
 * If the start cahr "#" is missing then the headers will all be "NA".
 *
 * @param fileRow a row with any numbers of headers,
 * @return ArrayList with headers
 */
protected List<String> getHeaders(String fileRow) {
    List<String> returnValue = new ArrayList<>();
    StringTokenizer tokenizer;
    if (fileRow.startsWith("#")) {
        tokenizer = new StringTokenizer(fileRow.substring(1));
    } else {
        tokenizer = new StringTokenizer(fileRow);
        tokenizer.nextToken();
        tokenizer.nextToken();//date and time is one header but two tokens
        while (tokenizer.hasMoreTokens()) {
            returnValue.add("NA");
            tokenizer.nextToken();
        }
        return returnValue;
    }
    tokenizer.nextToken();
    while (tokenizer.hasMoreTokens()) {
        returnValue.add(tokenizer.nextToken().trim());
    }
    return returnValue;
}

/**
 * @param fileRow must match pattern "yyyy-MM-dd HH:mm:ss.SSS"
 * @return
 * @throws ParseException
 */
public Millisecond getMillisecond(String timeStamp) throws ParseException {
    Date date = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss.SSS").parse(timeStamp);
    return new Millisecond(date);
}

Solution

  • Assuming that you invoke generateDataSet() from your implementation of doInBackground(), alterations to dataset will typically fire events on the background thread, a violation of Swing's single thread rule. Instead, publish() interim results and process() them as shown here.