Search code examples
javacsvparsingunivocity

Key value parser implementation with CSV parser in java


I am writing a program to parse key value based log like this:

dstcountry="United States" date=2018-12-13 time=23:47:32

I am using Univocity parser to do that. Here is my code.

CsvParserSettings parserSettings = new CsvParserSettings();
parserSettings.getFormat().setDelimiter(' ');
parserSettings.getFormat().setQuote('"');
parserSettings.getFormat().setQuoteEscape('"');
parserSettings.getFormat().setCharToEscapeQuoteEscaping('"');
CsvParser keyValueParser = new CsvParser(parserSettings);
String line = "dstcountry=\"United States\" date=2018-12-13 time=23:47:32";
String[] resp = keyValueParser.parseLine(line);

But the parser gives me this output:

dstcountry="United, 
States", 
date=2018-12-13, 
time=23:47:32

where the expected output was

dstcountry="United States", 
date=2018-12-13, 
time=23:47:32

Is there any problem with the code or is this a parser bug?

Regards,
Hari


Solution

  • I ended up writing my own parser. I am pasting here for future references if anybody needs. suggestions and comments are welcome.

    private static final int INSIDE_QT = 1;
    private static final int OUTSIDE_QT = 0;
    
    public String[] parseLine(char delimiter, char quote, char quoteEscape, char charToEscapeQuoteEscaping, String logLine) {
               char[] line = logLine.toCharArray();
        List<String> strList = new ArrayList<>();
        int state = OUTSIDE_QT;
        char lastChar = '\0';
        StringBuffer currentToken = new StringBuffer();
        for (int i = 0; i < line.length; i++) {
            if (state == OUTSIDE_QT) {
                if (line[i] == delimiter) {
                    strList.add(currentToken.toString());
                    currentToken.setLength(0);
                } else if (line[i] == quote) {
                    if (lastChar == quoteEscape) {
                        currentToken.deleteCharAt(currentToken.length() - 1);
                        currentToken.append(line[i]);
                    } else {
                        if (removeQuotes == false) {
                            currentToken.append(line[i]);
                        }
                        state = INSIDE_QT;
                    }
                } else if (line[i] == quoteEscape) {
                    if (lastChar == charToEscapeQuoteEscaping) {
                        currentToken.deleteCharAt(currentToken.length() - 1);
                        currentToken.append(line[i]);
                        continue;
                    } else {
                        currentToken.append(line[i]);
                    }
                } else {
                    currentToken.append(line[i]);
                }
            } else if (state == INSIDE_QT) {
                if (line[i] == quote) {
                    if (lastChar != quoteEscape) {
                        if (removeQuotes == false) {
                            currentToken.append(line[i]);
                        }
                        if (currentToken.length() == 0) {
                            currentToken.append('\0');
                        }
                        state = OUTSIDE_QT;
                    } else {
                        currentToken.append(line[i]);
                    }
                } else if (line[i] == quoteEscape) {
                    if (lastChar == charToEscapeQuoteEscaping) {
                        currentToken.deleteCharAt(currentToken.length() - 1);
                        currentToken.append(line[i]);
                        continue;
                    } else {
                        currentToken.append(line[i]);
                    }
                } else {
                    currentToken.append(line[i]);
                }
            }
            lastChar = line[i];
        }
        if (lastChar == delimiter) {
            strList.add("");
        }
        if (currentToken.length() > 0) {
            strList.add(currentToken.toString());
        }
        return strList.toArray(new String[strList.size()]);
    }