Search code examples
haskellparsec

Nested sepBy1 with same delimiter


#!/usr/bin/env runhaskell

import           Control.Applicative           ((<|>))
import           Text.Parsec.Char
import           Text.ParserCombinators.Parsec hiding (spaces, (<|>))

main :: IO ()
main = do
  print $ parse p "" "a\nb\n\nc\nd" where
  p  = sepBy1 (try pp) newline
  pp = sepBy1 l newline
  l  = many1 letter

I am trying to parse this:

a
b

c
d

to this: [["a", "b"], ["c", "d"]]. I tried fiddling with try but it doesn't seem to be working.

It is probably something quite basic, please try to explain what is going on in your answer (I am beginner in Haskell and Parsec).

Edit: Forgot to add an error message.

Left (line 3, column 1):
unexpected "\n"
expecting letter

Solution

  • The problem seems to be the implementation of sepBy1, because the error appears even for parse pp "" "a\nb\n". While we expect this to return Right ["a","b"], it throws the same expected \n error.

    So, it looks like sepBy1 works as expected, except in the case where the string to parse ends with the separator. This seems to be harmless, because there's another parser combinator for that case. But now that we want two nested sepBy1s with the same separator, that's a problem.

    The only solution I found is to write your own backtracking sepBy1, and use it in the inner case.

    main :: IO ()
    main = print $ parse p "" "a\nb\n\nc\nd"
      where  pp = mySepBy1 l newline
             l  = many1 letter
             p  = sepBy1 pp (newline >> newline)
    
    mySepBy1 parser separator = do
      x <- parser
      xs <- many (try $ separator >> parser)
      return (x:xs)