The code below outputs Right ["1<!>2<!>3"]
, but I need Right ["1", "2", "3"]
.
import Text.ParserCombinators.Parsec
response = contents :: CharParser () [String]
where
contents = sepBy content contentDelimiter
contentDelimiter = string "<!>"
content = many anyChar
main = do
putStrLn $ show $ parse response "Response" "1<!>2<!>3"
I suppose the problem here is that the content
parser consumes all the input before sepBy
gets to test the delimiter. So, my questions are:
Am I correct with my assumption? If not, what is the mistake I've made?
What solution would you recommend for such a problem? (Using Parsec)
* content
has to match any string not containing the delimiter. The 1<!>2<!>3
is just an example it can be dslkf\n><!>dsf<!>3
or whatever
For your first example, you would replace
content = many anyChar
with
content = many digit
So that the parser of the content doesn't erroneously match the separator.
Maybe you want to match more than just digits but even so, I advise you to think carefully about what is valid between <!>
s and write a parser that does that.
Why?
Once you've got a really good parser for content, your definition for response will be perfect. This way your content can include mystring = "hello<!>mum"
without being chopped by the top level parser - the low level stringLiteral
parser will eat the whole "hello<!>mum"
and the top level parser will never see the <!>
correctly and innocently included inside it.
Generally,...
In most parsing situations it's best to be really clear what's allowed in your content, and parse only that, for three reasons:
Reusability is important. At the moment, if you use a parser that just splits on <!>
and eats everything else, it's guaranteed to eat the whole input, and you won't be able to do any more parsing.
Bottom-up
Your parsers should work from the ground up - you described this very well in your comment as "stacking the parsers from specific to general".
It's easiest to write them in that order for ease of testing, so first write one that matches a stringChar
then stringLiteral
before member
before array
before object
before json
before content
then response
. You can have them calling each other recursively along the way. You can then use parseTest
to test each little one as you do along; typing parseTest response "1<!>2<!>3"
into ghci is quicker than rewriting main and compiling.
Top-down?
It's not wrong to write your parser top-down, just harder. You can write
response = many $ content `sepBy` contentSeparator
content = json <|> somethingElse
json = object <|> array
array = ...
but nothing is testable until you've written the very smallest parser.