I want to parse text like "John","Kate","Ruddiger"
into list of Strings.
I tried to start with parsing "John",
to Name (alias for String) but it already fails with Fail "\"," [","] "Failed reading: satisfyWith"
.
Question A: Why does this error occur and how can I fix it? (I didn't find call to satisfyWith in attoparsec's source code)
Question B: How can I make the parser to not require a comma after the last name?
{-# LANGUAGE OverloadedStrings #-}
import Data.Attoparsec.Char8 as P
import qualified Data.ByteString.Char8 as BS
import Control.Applicative(many)
data Name = Name String deriving Show
readName = P.takeWhile (/='"')
entryParser :: Parser Name
entryParser = do
P.char '"'
name <- readName
P.char ','
return $ Name (BS.unpack name)
someEntry :: IO BS.ByteString
someEntry = do
return $ BS.pack "\"John\","
main :: IO()
main = do
someEntry >>= print . parse entryParser
I am using GHC 7.6.3 and attoparsec-0.11.3.4.
Question A: Why does this error occur and how can I fix it? (I didn't find call to satisfyWith in attoparsec's source code)
readName = P.takeWhile (/='"')
takeWhile
consumes as long as the predicate is true. Therefor, after you read the name, "
hasn't been consumed. This is easy to see if we remove P.char ','
from the entryParser
:
entryParser = P.char '"' >> fmap (Name . BS.unpack) readName
$ runhaskell SO.hs Done "\"," Name "John"
You need to consume the "
:
entryParser :: Parser Name
entryParser = do
P.char '"'
name <- readName
P.char '"' -- <<<<<<<<<<<<<<<<<<<<<<
P.char ','
return $ Name (BS.unpack name)
Question B: How can I make the parser to not require a comma after the last name?
Use sepBy
.
Now your questions has been cleared up, lets make things a little bit easier. Don't consume the ,
at all in entryParser
, instead, only take the name:
entryParser = P.char '"' *> fmap ( Name . BS.unpack ) readName <* P.char '"'
In case you don't know (*>)
and (<*)
, they're both from Control.Applicative
, and they basically mean "discard whatever is on the asterisks side".
Now, in order to parse all comma separated entries, we use sepBy entryParser (P.char ',')
. However, this will lead into attoparsec returning a Partial:
$ runhaskell SO.hs Partial _
That's actually a feature of attoparsec you have to keep in mind:
Attoparsec supports incremental input, meaning that you can feed it a bytestring that represents only part of the expected total amount of data to parse. If your parser reaches the end of a fragment of input and could consume more input, it will suspend parsing and return a
Partial
continuation.
If you do want to use incremental input, use parse
and feed
. Otherwise use parseOnly
. The complete code for your example would be something like
{-# LANGUAGE OverloadedStrings #-}
import Data.Attoparsec.Char8 as P
import qualified Data.ByteString.Char8 as BS
import Control.Applicative(many, (*>), (<*))
data Name = Name String deriving Show
readName = P.takeWhile (/='"')
entryParser :: Parser Name
entryParser = P.char '"' *> fmap ( Name . BS.unpack ) readName <* P.char '"'
allEntriesParser = sepBy entryParser (P.char ',')
testString = "\"John\",\"Martha\",\"test\""
main = print . parseOnly allEntriesParser $ testString
$ runhaskell SO.hs Right [Name "John",Name "Martha",Name "test"]