Search code examples
haskellparsec

How do I sepBy ambiguous parse with Parsec?


I am trying to separate a string using a delimiter consisting of multiple characters, but the problem is that each of those characters can occur by itself in non-delimiting string. For example, I have foo*X*bar*X*baz, where the delimiter is *X*, so I want to get [foo, bar, baz], but each one of those can contain * or X.

I have tried

sepBy (many anyChar) delimiter

but that just swallows the whole string, giving "foo*X*bar*X*baz", if I do

sepBy anyChar (optional delimiter)

it filters out the delimiters correctly, but doesn't partition the list, returning "foobarbaz". I don't know which other combination I could try.


Solution

  • Perhaps you want something like this,

    tok = (:) <$> anyToken <*> manyTill anyChar (try (() <$ string sep) <|> eof)
    

    The anyToken prevents us from looping forever at the end of input, the try lets us avoid being over-eager in consuming separator characters.

    Full code for a test,

    module ParsecTest where
    import Control.Applicative ((<$), (<$>), (<*>))
    import Data.List (intercalate)
    import Text.Parsec
    import Text.Parsec.String
    
    sep,msg :: String
    sep = "*X*"
    msg = intercalate "*X*" ["foXo", "ba*Xr", "bX*az"]
    
    tok :: Parser String
    tok = (:) <$> anyToken <*> manyTill anyChar (try (() <$ string sep) <|> eof)
    
    toks :: Parser [String]
    toks = many tok
    
    test :: Either ParseError [String]
    test = runP toks () "" msg