Search code examples
haskellparsec

In Parsec, is there a way to prevent lexeme from consuming newlines?


All of the parsers in Text.Parsec.Token politely use lexeme to eat whitespace after a token. Unfortunately for me, whitespace includes new lines, which I want to use as expression terminators. Is there a way to convince lexeme to leave a new line?


Solution

  • No, it is not. Here is the relevant code.

    From Text.Parsec.Token:

    lexeme p
        = do{ x <- p; whiteSpace; return x  }
    
    
    --whiteSpace
    whiteSpace
        | noLine && noMulti  = skipMany (simpleSpace <?> "")
        | noLine             = skipMany (simpleSpace <|> multiLineComment <?> "")
        | noMulti            = skipMany (simpleSpace <|> oneLineComment <?> "")
        | otherwise          = skipMany (simpleSpace <|> oneLineComment <|> multiLineComment <?> "")
        where
          noLine  = null (commentLine languageDef)
          noMulti = null (commentStart languageDef)
    

    One will notice in the where clause of whitespace that the only only options looked at deal with comments. The lexeme function uses whitespace and it is used liberally in the rest of parsec.token.


    Update Sept. 28, 2015

    The ultimate solution for me was to use a proper lexical analyser (alex). Parsec does a very good job as a parsing library and it is a credit to the design that it can be mangled into doing lexical analysis, but for all but small and simple projects it will quickly become unwieldy. I now use alex to create a linear set of tokens and then Parsec turns them into an AST.