I want to parse assembly programs. I have a fixed format for parsing an assembly address: [ register + offset + label ]
I implemented parsers for registers, offsets and labels. Now I want to create a parser which parses the whole address.
The combinations I want to accept:
[register]
[offset]
[label]
[register + offset]
[register + label]
[offset + label]
[register + offset + label]
And what I don't want to accept:
[]
[register offset]
[register + ]
...
Of course the simple solution is to have something like:
choice $ try (parseRegister >>= \r -> Address (Just r) Nothing Nothing)
<|> try ...
But it is ugly and does not scale well with more types of elements. So I'm looking for a cleaner solution.
If you reorder your table, you see it’s a series of choices:
[register + offset + label]
[register + offset ]
[register + label]
[register ]
[ offset + label]
[ offset ]
[ label]
The grammar for which might be written:
address = '[' (register ('+' offset-label)? | offset-label) ']'
offset-label = offset ('+' label)? | label
Which in Applicative style is pretty straightforward, made only slightly noisy by wrapping everything in constructors:
parseAddress :: Parser Address
parseAddress = do
(register, (offset, label)) <- between (char '[') (char ']') parseRegisterOffsetLabel
return $ Address register offset label
parseRegisterOffsetLabel :: Parser (Maybe Register, (Maybe Offset, Maybe Label))
parseRegisterOffsetLabel = choice
[ (,)
<$> (Just <$> parseRegister)
<*> option (Nothing, Nothing) (char '+' *> parseOffsetLabel)
, (,) Nothing <$> parseOffsetLabel
]
parseOffsetLabel :: Parser (Maybe Offset, Maybe Label)
parseOffsetLabel = choice
[ (,)
<$> (Just <$> parseOffset)
<*> option Nothing (char '+' *> (Just <$> parseLabel))
, (,) Nothing . Just <$> parseLabel
]
If we add a couple of utility functions:
plus :: Parser a -> Parser a
plus x = char '+' *> x
just :: Parser a -> Parser (Maybe a)
just = fmap Just
We can clean up these implementations a bit:
parseRegisterOffsetLabel = choice
[ (,)
<$> just parseRegister
<*> option (Nothing, Nothing) (plus parseOffsetLabel)
, (,) Nothing <$> parseOffsetLabel
]
parseOffsetLabel = choice
[ (,)
<$> just parseOffset
<*> option Nothing (plus (just parseLabel))
, (,) Nothing <$> just parseLabel
]
Then factor out the repetition, giving us a decent final solution:
parseChain begin def rest = choice
[ (,) <$> just begin <*> option def (plus rest)
, (,) Nothing <$> rest
]
parseRegisterOffsetLabel = parseChain
parseRegister (Nothing, Nothing) parseOffsetLabel
parseOffsetLabel = parseChain
parseOffset Nothing (just parseLabel)
I’ll let you take care of whitespace around +
and inside []
.