Search code examples
parsinghaskellparsec

Parse a sub-string with parsec (by ignoring unmatched prefixes)


I would like to extract the repository name from the first line of git remote -v, which is usually of the form:

origin git@github.com:some-user/some-repo.git (fetch)

I quickly made the following parser using parsec:

-- | Parse the repository name from the output given by the first line of `git remote -v`.
repoNameFromRemoteP :: Parser String
repoNameFromRemoteP = do
    _ <- originPart >> hostPart
    _ <- char ':'
    firstPart <- many1 alphaNum
    _ <- char '/'
    secondPart <- many1 alphaNum
    _ <- string ".git"
    return $ firstPart ++ "/" ++ secondPart
    where
      originPart = many1 alphaNum >> space
      hostPart =  many1 alphaNum
               >> (string "@" <|> string "://")
               >> many1 alphaNum `sepBy` char '.'

But this parser looks a bit awkward. Actually I'm only interested in whatever follows the colon (":"), and it would be easier if I could just write a parser for it.

Is there a way to have parsec skip a character upon a failed match, and re-try from the next position?


Solution

  • If I've understood the question, try many (noneOf ":"). This will consume any character until it sees a ':', then stop.

    Edit: Seems I had not understood the question. You can use the try combinator to turn a parser which may consume some characters before failing into one that consumes no characters on a failure. So:

    skipUntil p = try p <|> (anyChar >> skipUntil p)
    

    Beware that this can be quite expensive, both in runtime (because it will try matching p at every position) and memory (because try prevents p from consuming characters and so the input cannot be garbage collected at all until p completes). You might be able to alleviate the first of those two problems by parameterizing the anyChar bit so that the caller could choose some cheap parser for finding candidate positions; e.g.

    skipUntil p skipper = try p <|> (skipper >> skipUntil p skipper)
    

    You could then potentially use the above many (noneOf ":") construction to only try p on positions that start with a :.