Search code examples
scalaparsingparser-combinators

How to filter reserved words in parser combinators?


I am using Scala's Parser Combinator framework, extending the RegexParsers class. I have an identifier token which starts with a letter and can contain alphabet characters, dashes, underscores and digits, as long as it is not one of the reserved words. I tried to use the parser's not() to function to stop reserved words from being used, however it is also matching identifiers which are prefixed with a reserved word.

def reserved = "and" | "or"

def identifier: Parser[String] = not(reserved) ~> """[a-zA-Z][\.a-zA-Z0-9_-]*""".r

However, when I tried to parse an identifier like and-today I get an error saying Expected Failure.

How do I only filter reserved words if they are a full match of the token and not just a prefix?

Also is there a way to improve the error reporting in this case when using not()? In other cases I get the regular expression that the parser is expecting, but in this case it just says Failure without any details.


Solution

  • You can use filterWithError both to filter out the reserved words and to customize the error message like this:

        val reservedWords = HashSet("and", "or")
    
        val idRegex= """[a-zA-Z][\.a-zA-Z0-9_-]*""".r
    
        val identifier = Parser(input =>
          idRegex(input).filterWithError(
            !reservedWords.contains(_),
            reservedWord => s"YOUR ERROR MESSAGE FOR $reservedWord",
            input
          )
        )