Search code examples
haskellattoparsec

Haskell attoparsec: "Failed reading: satisfyWith"


I want to parse text like "John","Kate","Ruddiger" into list of Strings.

I tried to start with parsing "John", to Name (alias for String) but it already fails with Fail "\"," [","] "Failed reading: satisfyWith".

Question A: Why does this error occur and how can I fix it? (I didn't find call to satisfyWith in attoparsec's source code)

Question B: How can I make the parser to not require a comma after the last name?

{-# LANGUAGE OverloadedStrings #-}

import Data.Attoparsec.Char8 as P
import qualified Data.ByteString.Char8 as BS
import Control.Applicative(many)

data Name = Name String deriving Show

readName =  P.takeWhile (/='"')

entryParser :: Parser Name
entryParser = do
    P.char '"'
    name <- readName
    P.char ','
    return $ Name (BS.unpack name)

someEntry :: IO BS.ByteString
someEntry = do
    return $ BS.pack "\"John\","

main :: IO()
main = do
    someEntry >>= print . parse entryParser  

I am using GHC 7.6.3 and attoparsec-0.11.3.4.


Solution

  • Question A: Why does this error occur and how can I fix it? (I didn't find call to satisfyWith in attoparsec's source code)

    readName =  P.takeWhile (/='"')
    

    takeWhile consumes as long as the predicate is true. Therefor, after you read the name, " hasn't been consumed. This is easy to see if we remove P.char ',' from the entryParser:

    entryParser = P.char '"' >> fmap (Name . BS.unpack) readName
    
    $ runhaskell SO.hs
    Done "\"," Name "John"

    You need to consume the ":

    entryParser :: Parser Name
    entryParser = do
        P.char '"'
        name <- readName
        P.char '"' -- <<<<<<<<<<<<<<<<<<<<<<
        P.char ','
        return $ Name (BS.unpack name)
    

    Question B: How can I make the parser to not require a comma after the last name?

    Use sepBy.


    Now your questions has been cleared up, lets make things a little bit easier. Don't consume the , at all in entryParser, instead, only take the name:

    entryParser = P.char '"' *> fmap ( Name . BS.unpack ) readName <* P.char '"'
    

    In case you don't know (*>) and (<*), they're both from Control.Applicative, and they basically mean "discard whatever is on the asterisks side".

    Now, in order to parse all comma separated entries, we use sepBy entryParser (P.char ','). However, this will lead into attoparsec returning a Partial:

    $ runhaskell SO.hs
    Partial _

    That's actually a feature of attoparsec you have to keep in mind:

    Attoparsec supports incremental input, meaning that you can feed it a bytestring that represents only part of the expected total amount of data to parse. If your parser reaches the end of a fragment of input and could consume more input, it will suspend parsing and return a Partial continuation.

    If you do want to use incremental input, use parse and feed. Otherwise use parseOnly. The complete code for your example would be something like

    {-# LANGUAGE OverloadedStrings #-}
    
    import Data.Attoparsec.Char8 as P
    import qualified Data.ByteString.Char8 as BS
    import Control.Applicative(many, (*>), (<*))
    
    data Name = Name String deriving Show
    
    readName =  P.takeWhile (/='"')
    
    entryParser :: Parser Name
    entryParser = P.char '"' *> fmap ( Name . BS.unpack ) readName <* P.char '"'
    
    allEntriesParser = sepBy entryParser (P.char ',')
    
    testString = "\"John\",\"Martha\",\"test\""
    
    main = print . parseOnly allEntriesParser $ testString  
    
    $ runhaskell SO.hs
    Right [Name "John",Name "Martha",Name "test"]