Search code examples
haskellparsec

Difficulty getting a Parsec parser to skip spaces correctly


I'm new to Parsec (and to parsers in general), and I'm having some trouble with this parser I wrote:

list = char '(' *> many (spaces *> some letter) <* spaces <* char ')'

The idea is to parse lists in this format (I'm working up to s-expressions):

(firstElement secondElement thirdElement and so on)

I wrote this code to test it:

import Control.Applicative
import Text.ParserCombinators.Parsec hiding (many)

list = char '(' *> many (spaces *> some letter) <* spaces <* char ')'

test s = do
  putStrLn $ "Testing " ++ show s ++ ":"
  parseTest list s
  putStrLn ""

main = do
  test "()"
  test "(hello)"
  test "(hello world)"
  test "( hello world)"
  test "(hello world )"
  test "( )"

This is the output I get:

Testing "()":
[]

Testing "(hello)":
["hello"]

Testing "(hello world)":
["hello","world"]

Testing "( hello world)":
["hello","world"]

Testing "(hello world )":
parse error at (line 1, column 14):
unexpected ")"
expecting space or letter

Testing "( )":
parse error at (line 1, column 3):
unexpected ")"
expecting space or letter

As you can see, it fails when there is white space between the last element of the list, and the closing ). I don't understand why the white space isn't consumed by the spaces I put in just before <* char ')'. What silly mistake have I made?


Solution

  • The problem is that the final spaces are consumed by the spaces in the argument to many,

    list = char '(' *> many (spaces *> some letter) <* spaces <* char ')'
                         --  ^^^^^^ that one
    

    and then the parser expects some letter but finds a closing parenthesis and thus fails.

    To solve it, consume spaces only after tokens,

    list = char '(' *> spaces *> many (some letter <* spaces) <* char ')'
    

    that works as desired:

    $ runghc lisplists.hs 
    Testing "()":
    []
    
    Testing "(hello)":
    ["hello"]
    
    Testing "(hello world)":
    ["hello","world"]
    
    Testing "( hello world)":
    ["hello","world"]
    
    Testing "(hello world )":
    ["hello","world"]
    
    Testing "( )":
    []