Search code examples
haskellircparsec

Parsec - Input delimited by subset of main content


As a sort of practice project, I want to implement a library that parses IRC messages. One of the things I'll have to parse are shortnames, given by the BNF:

shortname = ( letter / digit ) *( letter / digit / "-" ) *( letter / digit )

I have the parsers alphaNum and (alphaNum <|> char '-'), corresponding to those elements, that's easy. However, I have trouble combining them to conform to the specification. between alphaNum alphaNum (alphaNum <|> char '-') doesn't work, and I have trouble incorporating lookAhead in a way that makes it do what I want it to.


Solution

  • The problem is that the final part (a letter or a number, but not a dash) is consumed by the previous part. I'd suggest changing the grammar to

    shortname = ( letter / digit ) *( *( "-" ) ( letter / digit ) )

    or perhaps more efficient

    shortname = +( letter / digit ) *( +( "-" ) +( letter / digit ) )

    This ensures that while the inner part can contain both letters, digits and dashes, any dash must always be followed by a letter/digit. A parsec solution could be

    shortName :: Stream s m Char => ParsecT s u m String
    shortName = (++) <$> many1 alphaNum
                     <*> (concat <$> many ((++) <$> many1 (char '-')
                                                <*> many1 alphaNum))