Search code examples
haskelltokenlexical-analysisparsec

Lexical analysis of string token using Parsec


I have this parser for string parsing using Haskell Parsec library.

myStringLiteral = lexeme (
        do str <- between (char '\'')
                  (char '\'' <?> "end of string")
                  (many stringChar)
                  ; return (U.replace "''" "'" (foldr (maybe id (:)) "" str))

        <?> "literal string"
        )

Strings in my language are defined as alpha-num characters inside of '' (example: 'this is my string'), but these string can also contain ' inside of it (in this case ' must be escaped by another ', ex 'this is my string with '' inside of it').

What I need to do, is to look forward when ' appears during parsing of string and decide, if there is another ' after or not (if no, return end of string). But I dont know how to do it. Any ideas? Thanks!


Solution

  • If the syntax is as simple as it seems, you can make a special case for the escaped single quote,

    escapeOrStringChar :: Parser Char
    escapeOrStringChar = try (string "''" >> return '\'') <|> stringChar
    

    and use that in

    myStringLiteral = lexeme $ do
        char '\''
        str <- many escapeOrStringChar
        char '\'' <?> "end of string"
        return str