Make a parser ignore all redundant whitespace

Say I have a Parser p in Parsec and I want to specify that I want to ignore all superfluous/redundant white space in p. Let's for example say that I define a list as starting with "[", end with "]", and in the list are integers separated by white space. But I don't want any errors if there are white space in front of the "[", after the "]", in between the "[" and the first integer, and so on.

In my case, I want this to work for my parser for a toy programming language.

I will update with code if that is requested/necessary.

Solution

Use combinators to say what you mean:

import Control.Applicative
import Text.Parsec
import Text.Parsec.String

program :: Parser [[Int]]
program = spaces *> many1 term <* eof

term :: Parser [Int]
term = list

list :: Parser [Int]
list = between listBegin listEnd (number `sepBy` listSeparator)

listBegin, listEnd, listSeparator :: Parser Char
listBegin = lexeme (char '[')
listEnd = lexeme (char ']')
listSeparator = lexeme (char ',')

lexeme :: Parser a -> Parser a
lexeme parser = parser <* spaces

number :: Parser Int
number = lexeme $ do
  digits <- many1 digit
  return (read digits :: Int)

Try it out:

λ :l Parse.hs
Ok, modules loaded: Main.
λ parseTest program " [1, 2, 3] [4, 5, 6] "
[[1,2,3],[4,5,6]]

This lexeme combinator takes a parser and allows arbitrary whitespace after it. Then you only need to use lexeme around the primitive tokens in your language such as listSeparator and number.

Alternatively, you can parse the stream of characters into a stream of tokens, then parse the stream of tokens into a parse tree. That way, both the lexer and parser can be greatly simplified. It’s only worth doing for larger grammars, though, where maintainability is a concern; and you have to use some of the lower-level Parsec API such as tokenPrim.