Search code examples
haskellparsec

Parsec: line continuation trouble


I am having trouble figuring this out.

So if a string is followed by one or many newline chars without one or many spaces after it - it is an end of line and I return the line. If a string is followed by one or many newline chars and then one or many spaces after that - it is a line continuation and I keep going till I encounter newlines without spaces. Then return it.

This just totally locked my brain. Please help.

UPDATE

In case there is a confusion about my explanation above, I give an example

From: John Doe <[email protected]>
To: [email protected]
Content-Type: multipart/alternative;
  boundary=047d7b2e4e3cdc627304eb094bfe

Given the above text I should be able to parse 3 lines for further processing like so

["From: John Doe <[email protected]>", "To: [email protected]", "Content-Type: multipart/alternative; boundary=047d7b2e4e3cdc627304eb094bfe"]

Solution

  • Something like this pseudocode, perhaps (assuming you want to keep all the whitespace):

    continuedLine = go "" where
        go s = do
            s'      <- many (noneOf "\n")
            empties <- many (char '\n')
            let soFar = s ++ s' ++ empties
            (char ' ' >> go (soFar ++ " ")) <|> return soFar
    

    Apply your favorite transformation to eliminate the deeply-nested left-associated ++s.

    EDIT: Hm, it just occurred to me that there's a subtlety I may have overlooked. In case this is not a continuation, are you hoping to leave the newlines "unparsed", so to speak? If so, you can use try to do something like this:

    continuedLine = go "" where
        continuationHerald = do
            empties <- many (char '\n')
            char ' '
            return (empties ++ " ")
    
        go s = do
            s'   <- many (noneOf "\n")
            cont <- try (Just <$> continuationHerald) <|> return Nothing
            case cont of
                Nothing -> return (s ++ s')
                Just empties -> go (s ++ s' ++ empties)
    

    Note that we go to some length to avoid putting the recursive call to go inside the try. This is an efficiency concern: doing so would cause the parser to refuse to give up on the alternate return Nothing branch, and prevent garbage collection of the beginning of the string being parsed.