I am writing a program to parse key value based log like this:
dstcountry="United States" date=2018-12-13 time=23:47:32
I am using Univocity parser to do that. Here is my code.
CsvParserSettings parserSettings = new CsvParserSettings();
parserSettings.getFormat().setDelimiter(' ');
parserSettings.getFormat().setQuote('"');
parserSettings.getFormat().setQuoteEscape('"');
parserSettings.getFormat().setCharToEscapeQuoteEscaping('"');
CsvParser keyValueParser = new CsvParser(parserSettings);
String line = "dstcountry=\"United States\" date=2018-12-13 time=23:47:32";
String[] resp = keyValueParser.parseLine(line);
But the parser gives me this output:
dstcountry="United,
States",
date=2018-12-13,
time=23:47:32
where the expected output was
dstcountry="United States",
date=2018-12-13,
time=23:47:32
Is there any problem with the code or is this a parser bug?
Regards,
Hari
I ended up writing my own parser. I am pasting here for future references if anybody needs. suggestions and comments are welcome.
private static final int INSIDE_QT = 1;
private static final int OUTSIDE_QT = 0;
public String[] parseLine(char delimiter, char quote, char quoteEscape, char charToEscapeQuoteEscaping, String logLine) {
char[] line = logLine.toCharArray();
List<String> strList = new ArrayList<>();
int state = OUTSIDE_QT;
char lastChar = '\0';
StringBuffer currentToken = new StringBuffer();
for (int i = 0; i < line.length; i++) {
if (state == OUTSIDE_QT) {
if (line[i] == delimiter) {
strList.add(currentToken.toString());
currentToken.setLength(0);
} else if (line[i] == quote) {
if (lastChar == quoteEscape) {
currentToken.deleteCharAt(currentToken.length() - 1);
currentToken.append(line[i]);
} else {
if (removeQuotes == false) {
currentToken.append(line[i]);
}
state = INSIDE_QT;
}
} else if (line[i] == quoteEscape) {
if (lastChar == charToEscapeQuoteEscaping) {
currentToken.deleteCharAt(currentToken.length() - 1);
currentToken.append(line[i]);
continue;
} else {
currentToken.append(line[i]);
}
} else {
currentToken.append(line[i]);
}
} else if (state == INSIDE_QT) {
if (line[i] == quote) {
if (lastChar != quoteEscape) {
if (removeQuotes == false) {
currentToken.append(line[i]);
}
if (currentToken.length() == 0) {
currentToken.append('\0');
}
state = OUTSIDE_QT;
} else {
currentToken.append(line[i]);
}
} else if (line[i] == quoteEscape) {
if (lastChar == charToEscapeQuoteEscaping) {
currentToken.deleteCharAt(currentToken.length() - 1);
currentToken.append(line[i]);
continue;
} else {
currentToken.append(line[i]);
}
} else {
currentToken.append(line[i]);
}
}
lastChar = line[i];
}
if (lastChar == delimiter) {
strList.add("");
}
if (currentToken.length() > 0) {
strList.add(currentToken.toString());
}
return strList.toArray(new String[strList.size()]);
}