Search code examples
javasupercsv

Error regarding usage of super csv bean reader


I have the following dependency added:

    <dependency>
    <groupId>net.sf.supercsv</groupId>
    <artifactId>super-csv</artifactId>
    <version>2.4.0</version>
    </dependency>

    private final static String[] COLS = { "col1", "col2", "col3", "col4", "col5",
        "col6", "col7", "col8", "col9", "col10", "col11",
        "col12", "col13", "col14" };


    private final static String[] TEMP_COLS = {"col1", "col2", "col3", "col4", "col5",
        "col6", "col7", "col8", "col9", "col10", "col11",
        "col12", "col13"};

The below is how I build my reader.

protected CsvPreference csvPref = CsvPreference.STANDARD_PREFERENCE;
 protected String encoding = "US-ASCII";
InputStream is = fs.open(path);
      BufferedReader br = new BufferedReader(new InputStreamReader(is, encoding));
      ICsvBeanReader csvReader = new CsvBeanReader(br, csvPref);

As part of bean reader, I have the following code:

Selections bean = null;

    try{
        bean = reader.read(Selections.class, Selections.getCols());
        }catch(Exception e){    
   // bean = reader.read(Selections.class, Selections.getTempCols());
   // slf4j.error(bean.getEventCode() + bean.getProgramId());
    slf4j.error("Error Logged for bean because of COLUMNS MISMATCH");
        }

In the above code, It is throwing exception :

java.lang.IllegalArgumentException:the nameMapping array and the number of columns read should be the same size (nameMapping length = 14, columns = 13))

I am not sure what is causing this exception.It is throwing this exception on some of the records even if all the records have 14 columns(I have verified this by using a script, I have even created a schema and uploaded the file with 14 columns). Out of 7,000,000 records 2,100,000 has this issue.

In order to debug what record is causing this problem I have made the below changes to the code.

Selections bean = null;

        try{
            bean = reader.read(Selections.class, Selections.getCols());
            }catch(Exception e){    
        bean = reader.read(Selections.class, Selections.getTempCols());
        slf4j.error(bean.getEventCode() + bean.getProgramId());
        slf4j.error("Error Logged for bean because of COLUMNS MISMATCH");
            }

Now, the above changes are throwing : java.lang.IllegalArgumentException: the nameMapping array and the number of columns read should be the same size (nameMapping length = 13, columns = 14)

I have no idea why the open csv reader is behaving so strangely. When the count of columns is not 14 it would cause exception and in exception when trying to read it to print the details, It says the column count is 14.

Please help me debug this issue. I shall update more details about the issue if needed. Please let me know.


Solution

  • Finally I resolved the problem, the problem is because of the columnquote mode character that I have given in my CSV preferences.

    new CsvPreference.Builder('"', '\u0001', "\r\n").build()
    

    My incoming data has " as part of the data. The issue got resolved when I have replaced quoted column with a character that will never be part of the incoming data.

    I am not an expert at it, it is because of my ignorance and super-scv is not at fault. I believe super-csv is a decent API to explore and use.

    To know more about column quote mode, please refer to their API. https://super-csv.github.io/super-csv/apidocs/org/supercsv/quote/ColumnQuoteMode.html