Search code examples
javaunivocity

Univocity parser - Handling lines with weird constructs


I am trying to figure the best way to use University parser to handle CSV log file with lines looks like below,

"23.62.3.74",80,"testUserName",147653,"Log Collection Device 100","31/02/15 00:05:10 GMT",-1,"10.37.255.3","TCP","destination_ip=192.62.3.74|product_id=0071|option1_type=(s-dns)|proxy_machine_ip=10.1.255.3"

As you can see this is a comma delimited file but the last column has got bunch of values prefixed with its field names. My requirement is to capture values from normal fields and selectively from this last big field.

I know the master details row processor in Univocity but I doubt if this fit into that category. Could you guide me to the right direction please?

Note: I can handle the name prefixed fields in rowProcessed(String[] row, ParsingContext context) if I implement a row processor but I am looking for something native to Univocity if possible?

Thanks, R


Solution

  • There's nothing native in the parser for that. Probably the easiest way to go about it is to have your RowProcessor as you mentioned.

    One thing you can try to do to make your life easier is to use another instance of CsvParser to parse that last record:

    //initialize a parser for the pipe separated bit
    CsvParserSettings detailSettings = new CsvParserSettings();
    detailSettings.getFormat().setDelimiter('=');
    detailSettings.getFormat().setLineSeparator("|");
    CsvParser detailParser = new CsvParser(detailSettings);
    
    //here is the content of the last column (assuming you got it from the parser)
    String details = "destination_ip=192.62.3.74|product_id=0071|option1_type=(s-dns)|proxy_machine_ip=10.1.255.3";
    
    //The result will be a list of pairs
    List<String[]> pairs = detailParser.parseAll(new StringReader(details));
    
    //You can add the pairs to a map
    Map<String, String> map = new HashMap<String, String>();
    for (String[] pair : pairs) {
        map.put(pair[0], pair[1]);
    }
    
    //this should print: {destination_ip=192.62.3.74, product_id=0071, proxy_machine_ip=10.1.255.3, option1_type=(s-dns)}
    System.out.println(map);
    

    That won't be extremely fast but at least it's easy to work with a map if that input can have random column names and values associated with them.