Search code examples
scalaunicodecharacter-encodingpegparboiled

Is there a rule to match unicode printable characters in parboiled2?


As part of a larger parser, I am writing a rule to match strings like the following using parboiled2:

Italiana Relè

I would like to use something simple like the following:

CharPredicate.Printable

But the parser is failing with an org.parboiled2.ParseError because of the unicode character at the end of the string.

Is there a simple option that I'm not aware of for matching printable unicode characters?


Solution

  • Take a look at https://github.com/sirthias/parboiled2/blob/master/parboiled-core/src/main/scala/org/parboiled2/CharPredicate.scala#L112 - it is very easy to do your own predicates, for instance:

    val latinSupplementCharsPredicate = CharPredicate('\u00c0' to '\u00dc') ++ CharPredicate('\u00e0' to '\u00fd')