Search code examples
haskellparsec

Parsec permutation parser with separators


I want to parse assembly programs. I have a fixed format for parsing an assembly address: [ register + offset + label ] I implemented parsers for registers, offsets and labels. Now I want to create a parser which parses the whole address.

The combinations I want to accept:

[register]
[offset]
[label]
[register + offset]
[register + label]
[offset + label]
[register + offset + label]

And what I don't want to accept:

[]
[register offset]
[register + ]
...

Of course the simple solution is to have something like:

choice $ try (parseRegister >>= \r -> Address (Just r) Nothing Nothing)
       <|> try ...

But it is ugly and does not scale well with more types of elements. So I'm looking for a cleaner solution.


Solution

  • If you reorder your table, you see it’s a series of choices:

    [register + offset + label]
    [register + offset        ]
    [register          + label]
    [register                 ]
    [           offset + label]
    [           offset        ]
    [                    label]
    

    The grammar for which might be written:

    address      = '[' (register ('+' offset-label)? | offset-label) ']'
    offset-label = offset ('+' label)? | label
    

    Which in Applicative style is pretty straightforward, made only slightly noisy by wrapping everything in constructors:

    parseAddress :: Parser Address
    parseAddress = do
      (register, (offset, label)) <- between (char '[') (char ']') parseRegisterOffsetLabel
      return $ Address register offset label
    
    parseRegisterOffsetLabel :: Parser (Maybe Register, (Maybe Offset, Maybe Label))
    parseRegisterOffsetLabel = choice
      [ (,)
        <$> (Just <$> parseRegister)
        <*> option (Nothing, Nothing) (char '+' *> parseOffsetLabel)
      , (,) Nothing <$> parseOffsetLabel
      ]
    
    parseOffsetLabel :: Parser (Maybe Offset, Maybe Label)
    parseOffsetLabel = choice
      [ (,)
        <$> (Just <$> parseOffset)
        <*> option Nothing (char '+' *> (Just <$> parseLabel))
      , (,) Nothing . Just <$> parseLabel
      ]
    

    If we add a couple of utility functions:

    plus :: Parser a -> Parser a
    plus x = char '+' *> x
    
    just :: Parser a -> Parser (Maybe a)
    just = fmap Just
    

    We can clean up these implementations a bit:

    parseRegisterOffsetLabel = choice
      [ (,)
        <$> just parseRegister
        <*> option (Nothing, Nothing) (plus parseOffsetLabel)
      , (,) Nothing <$> parseOffsetLabel
      ]
    
    parseOffsetLabel = choice
      [ (,)
        <$> just parseOffset
        <*> option Nothing (plus (just parseLabel))
      , (,) Nothing <$> just parseLabel
      ]
    

    Then factor out the repetition, giving us a decent final solution:

    parseChain begin def rest = choice
      [ (,) <$> just begin <*> option def (plus rest)
      , (,) Nothing <$> rest
      ]
    
    parseRegisterOffsetLabel = parseChain
      parseRegister (Nothing, Nothing) parseOffsetLabel
    
    parseOffsetLabel = parseChain
      parseOffset Nothing (just parseLabel)
    

    I’ll let you take care of whitespace around + and inside [].