Search code examples
javacsvunivocity

Java using uniVocity parser particular column data has comma and its not enclosed


I am mostly using uniVocity as a CSV parser, its a excellent parser. I have hit below problem with below rows, this file will have fixed number of 7 columns. The problem is with client name, it can have commas, the next column to it Type, it is generally S or P.

Below is the test data,

Date,Code,Company,Client,Type,Quantity,Price
03/03/2014,500103,BHEL,PoI THROUGH DoI, Affairs,S,114100000,165.55
21/04/2017,533309,DALMI,KKR MAURITIUS CEMENT, LTD.,S,106020,2050.00
21/04/2017,533309,DALMI,KKR MAURITIUS CEMENT, LTD.,P,141740,2050.00

Above data has problem with Client name because data itself has comma and its not enclosed. below are the client names

PoI THROUGH DoI, Affairs
KKR MAURITIUS CEMENT, LTD.
KKR MAURITIUS CEMENT, LTD.

Could you please let me know how to handle it

Thanks


Solution

  • You can't really do a lot here if the data doesn't come enclosed with quotes. All you can realistically do is to check the row length and if it is greater than 7 you know that the extra columns are part of the client name.

    Here is my solution:

    for (String[] row : rows) {
            if (row.length > 7) {
                int extraColumns = row.length - 7; //we have extra columns
                String[] fixed = new String[7]; // let's create a row in the correct format
    
                //copies all data before name
                for (int i = 0, j = 0; i < row.length; i++, j++) {
                    fixed[j] = row[i]; //keep copying values, until we reach the name
    
                    if (i == 3) { //hit first column with a name in it
                        for (int k = i + 1; k <= i + extraColumns; k++) { //append comma and the value that follows the name
                            fixed[i] += ", " + row[k];
                        }
    
                        i += extraColumns; //increase variable i and keep assigning values after it to position j
                    }
                }
                row = fixed; //replaces the original broken row
            }
    
            //prints the resulting row, values in square brackets for clarity.
            for (String element : row) {
                System.out.print('[' + element + ']' + ",");
            }
            System.out.println();
        }
    

    This produces the output:

    [Date],[Code],[Company],[Client],[Type],[Quantity],[Price],
    [03/03/2014],[500103],[BHEL],[PoI THROUGH DoI, Affairs],[S],[114100000],[165.55],
    [21/04/2017],[533309],[DALMI],[KKR MAURITIUS CEMENT, LTD.],[S],[106020],[2050.00],
    [21/04/2017],[533309],[DALMI],[KKR MAURITIUS CEMENT, LTD.],[P],[141740],[2050.00],
    

    Hope it helps.