I am having trouble figuring this out.
So if a string is followed by one or many newline chars without one or many spaces after it - it is an end of line and I return the line. If a string is followed by one or many newline chars and then one or many spaces after that - it is a line continuation and I keep going till I encounter newlines without spaces. Then return it.
This just totally locked my brain. Please help.
UPDATE
In case there is a confusion about my explanation above, I give an example
From: John Doe <[email protected]>
To: [email protected]
Content-Type: multipart/alternative;
boundary=047d7b2e4e3cdc627304eb094bfe
Given the above text I should be able to parse 3 lines for further processing like so
["From: John Doe <[email protected]>", "To: [email protected]", "Content-Type: multipart/alternative; boundary=047d7b2e4e3cdc627304eb094bfe"]
Something like this pseudocode, perhaps (assuming you want to keep all the whitespace):
continuedLine = go "" where
go s = do
s' <- many (noneOf "\n")
empties <- many (char '\n')
let soFar = s ++ s' ++ empties
(char ' ' >> go (soFar ++ " ")) <|> return soFar
Apply your favorite transformation to eliminate the deeply-nested left-associated ++
s.
EDIT: Hm, it just occurred to me that there's a subtlety I may have overlooked. In case this is not a continuation, are you hoping to leave the newlines "unparsed", so to speak? If so, you can use try
to do something like this:
continuedLine = go "" where
continuationHerald = do
empties <- many (char '\n')
char ' '
return (empties ++ " ")
go s = do
s' <- many (noneOf "\n")
cont <- try (Just <$> continuationHerald) <|> return Nothing
case cont of
Nothing -> return (s ++ s')
Just empties -> go (s ++ s' ++ empties)
Note that we go to some length to avoid putting the recursive call to go
inside the try
. This is an efficiency concern: doing so would cause the parser to refuse to give up on the alternate return Nothing
branch, and prevent garbage collection of the beginning of the string being parsed.