As a sort of practice project, I want to implement a library that parses IRC messages. One of the things I'll have to parse are shortnames, given by the BNF:
shortname = ( letter / digit ) *( letter / digit / "-" ) *( letter / digit )
I have the parsers alphaNum
and (alphaNum <|> char '-')
, corresponding to those elements, that's easy. However, I have trouble combining them to conform to the specification. between alphaNum alphaNum (alphaNum <|> char '-')
doesn't work, and I have trouble incorporating lookAhead
in a way that makes it do what I want it to.
The problem is that the final part (a letter or a number, but not a dash) is consumed by the previous part. I'd suggest changing the grammar to
shortname = ( letter / digit ) *( *( "-" ) ( letter / digit ) )
or perhaps more efficient
shortname = +( letter / digit ) *( +( "-" ) +( letter / digit ) )
This ensures that while the inner part can contain both letters, digits and dashes, any dash must always be followed by a letter/digit. A parsec solution could be
shortName :: Stream s m Char => ParsecT s u m String
shortName = (++) <$> many1 alphaNum
<*> (concat <$> many ((++) <$> many1 (char '-')
<*> many1 alphaNum))