Search code examples
haskellparsec

Parsec permutation parsing


I wrote such permutation parsing example:

data Entry = Entry {
    first_name :: String
  , last_name :: String
  , date_of_birth :: Maybe String
  , nationality :: Maybe String
  , parentage :: Maybe String
} deriving (Show)

nameParser :: Parser (String, String)
nameParser = do
  first_name <- many1 upper
  endOfLine
  last_name <- many1 letter
  endOfLine
  return $ (first_name, last_name)

attributeParser :: String -> Parser String
attributeParser field = do
  string $ field ++ ": "
  value <- many1 (noneOf "\n")
  endOfLine
  return value

entryParser :: Parser Entry
entryParser = do
  (f, l) <- nameParser
  (d, n, p) <- permute ((,,)
    <$?> (Nothing, liftM Just (try $ attributeParser "Date of Birth"))
    <|?> (Nothing, liftM Just (try $ attributeParser "Nationality"))
    <|?> (Nothing, liftM Just (try $ attributeParser "Parentage"))
    )
  return $ Entry f l d n p

main = do
    mapM_ putStrLn . map (show . parse entryParser "") $ goodTests

goodTests =
  "AAKVAAG\nTorvild\nDate of Birth: 1 July\nNationality: Norwegian\nParentage: business executive\n" :
  "AAKVAAG\nTorvild\nNationality: Norwegian\nParentage: business executive\n" :
  "AAKVAAG\nTorvild\nParentage: business executive\nNationality: Norwegian\n" :
  "AAKVAAG\nTorvild\nParentage: business executive\n" :
  "AAKVAAG\nTorvild\nNationality: Norwegian\n" : []

It would be good to extend Entry data with new fields in future, but doing that will require to put even more repetitive code in entryParser function. Is there a way to make this function accept list of parsers?

I started with this:

attributeParsers =
  map attributeParser ["Date of Birth", "Nationality", "Parentage"]

permuteParams =
  map (\p -> (Nothing, liftM Just (try p))) attributeParsers

But could not come of with correct way to fold permuteParams together with <|?> operator (I guess it would require something smarter than (,,) tuple constructor then).


Solution

  • As a first step, you can abstract the stuff you do for every component:

    attr txt = (Nothing, liftM Just (try $ attributeParser txt))
    

    With this, you can go to:

    entryParser :: Parser Entry
    entryParser = do
      (f, l) <- nameParser
      (d, n, p) <- permute ((,,)
        <$?> attr "Date of Birth"
        <|?> attr "Nationality"
        <|?> attr "Parentage"
        )
      return $ Entry f l d n p
    

    Then, if you want, you can combine the infix combinators and the attr calls:

    f .$ x = f <$?> attr x
    f .| x = f <|?> attr x
    
    infixl 2 .$
    infixl 2 .|
    

    This gives you:

    entryParser :: Parser Entry
    entryParser = do
      (f, l) <- nameParser
      (d, n, p) <- permute ((,,)
        .$ "Date of Birth"
        .| "Nationality"
        .| "Parentage"
        )
      return $ Entry f l d n p
    

    Then you can further simplify by getting rid of the intermediate triple. All you're doing is to build it and then apply its components to Entry f l, so you can as well apply the result of the permutation parser to Entry f l directly:

    entryParser :: Parser Entry
    entryParser = do
      (f, l) <- nameParser
      permute (Entry f l
        .$ "Date of Birth"
        .| "Nationality"
        .| "Parentage"
        )
    

    I think this is compact enough. If you really want some kind of fold, you'll either have to introduce an intermediate list and collect the permutation results in a list. This, however, only works as long as all the permutable attributes are of the same type (they currently are), and is not so nice because you'll make assumptions about the number of elements in this list. Or you'll have to use a heterogeneous list / some type class magic, which will lead to more complexity with the types and is, I think, not worth it here.