Search code examples
javacsvsupercsv

How do I skip white-space only lines using Super CSV?


How do I configure Super CSV to skip blank or white-space only lines?

I'm using the CsvListReader and sometimes I'll get a blank line in my data. When this happens, an exception to the effect of:

number of CellProcessors must match number of fields

I'd like to simply skip these lines.


Solution

  • Update: Super CSV 2.1.0 (released April 2013) allows you to supply a CommentMatcher via the preferences that will let you skip lines that are considered comments. There are 2 built in matchers you can use, or you can supply your own. In this case you could use new CommentMatches("\\s+") to skip blank lines.


    Super CSV only skips lines of zero length (just a line terminator).

    It's not a valid CSV file if there are blank lines (see rule 4 of RFC4180 which states that Each line should contain the same number of fields throughout the file). The only time a blank line is valid is if it's part of a multi-line field surrounded by quotes. e.g.

    column1,column2
    "multi-line field
    
    with a blank line",value2
    

    That being said, it might be possible to make Super CSV a bit more lenient with blank lines (it could ignore them). If you could post a feature request on our SourceForge page, we can investigate this further and potentially add this functionality in a future release.

    That doesn't help you right now though!

    I haven't done extensive testing on this, but it should work :) You can write your own tokenizer that skips blank lines:

    package org.supercsv.io;
    
    import java.io.IOException;
    import java.io.Reader;
    import java.util.List;
    
    import org.supercsv.prefs.CsvPreference;
    
    public class SkipBlankLinesTokenizer extends Tokenizer {
    
        public SkipBlankLinesTokenizer(Reader reader, CsvPreference preferences) {
            super(reader, preferences);
        }
    
        @Override
        public boolean readColumns(List<String> columns) throws IOException {
    
            boolean moreInput = super.readColumns(columns);
    
            // keep reading lines if they're blank
            while (moreInput && (columns.size() == 0 || 
                                 columns.size() == 1 && 
                                 columns.get(0).trim().isEmpty())){
                moreInput = super.readColumns(columns);
            }
    
            return moreInput;
        }
    
    }
    

    And just pass this into the constructor of your reader (you'll have to pass the preferences into both the reader and the tokenizer):

    ICsvListReader listReader = null;
    try {
        CsvPreference prefs = CsvPreference.STANDARD_PREFERENCE;
        listReader = new CsvListReader(
            new SkipBlankLinesTokenizer(new FileReader(CSV_FILENAME), prefs),
            prefs);
    ...
    

    Hope this helps