I am trying to parse a single line which contains strings separated by delimiters into a sequence of these strings. It should be able to have any character in the strings, if a field contains a delimiter it needs double quotes around it. In order to have double quotes in such a field, the double quotes are escaped by .
I used this as a starting point: https://github.com/sirthias/parboiled2/blob/695ee6603359cfcb97734edf6dd1d27383c48727/examples/src/main/scala/org/parboiled2/examples/CsvParser.scala
My grammar looks like this:
class CsvParser(val input: ParserInput, val delimiter: String = ",") extends Parser {
def line: Rule1[Seq[String]] = rule {record ~ EOI}
def record = rule(oneOrMore(field).separatedBy(delimiter))
def QUOTE = "\""
def ESCAPED_QUOTE = "\\\""
def DELIMITER_QUOTE = delimiter+"\""
def WS = " \t".replace(delimiter, "")
def field = rule{whiteSpace ~ ((QUOTE ~ escapedField ~ QUOTE) | unquotedField) ~ whiteSpace}
def escapedField = rule { capture(zeroOrMore(noneOf(QUOTE) | ESCAPED_QUOTE)) ~> (_.replace(ESCAPED_QUOTE, QUOTE)) }
def unquotedField = rule { capture(zeroOrMore(noneOf(DELIMITER_QUOTE))) }
def whiteSpace = rule(zeroOrMore(anyOf(WS)))
}
When I call it with "quote\"key",1,2
I get Invalid input 'k', expected whiteSpace, ',' or 'EOI' (line 1, column 9)
What am I doing wrong? How would I debug this? (And as a bonus question: How would I extend the grammar to allow the delimiter to be multiple chars like ##
?)
Thank you!
Parboiled2 seems to execute rules without backtracking.
In this particular case
def escapedField = rule { capture(zeroOrMore(noneOf(QUOTE) | ESCAPED_QUOTE)) ~> (_.replace(ESCAPED_QUOTE, QUOTE)) }
noneOf(QUOTE)
captures the \ from \" and then returns, instead of backtracking and trying to capture the full \".
The error is solved by using
def escapedField = rule { capture(ESCAPED_QUOTE | zeroOrMore(noneOf(QUOTE))) ~> (_.replace(ESCAPED_QUOTE, QUOTE)) }