I would like to extract the repository name from the first line of git remote -v
, which is usually of the form:
origin git@github.com:some-user/some-repo.git (fetch)
I quickly made the following parser using parsec
:
-- | Parse the repository name from the output given by the first line of `git remote -v`.
repoNameFromRemoteP :: Parser String
repoNameFromRemoteP = do
_ <- originPart >> hostPart
_ <- char ':'
firstPart <- many1 alphaNum
_ <- char '/'
secondPart <- many1 alphaNum
_ <- string ".git"
return $ firstPart ++ "/" ++ secondPart
where
originPart = many1 alphaNum >> space
hostPart = many1 alphaNum
>> (string "@" <|> string "://")
>> many1 alphaNum `sepBy` char '.'
But this parser looks a bit awkward. Actually I'm only interested in whatever follows the colon (":"
), and it would be easier if I could just write a parser for it.
Is there a way to have parsec
skip a character upon a failed match, and re-try from the next position?
If I've understood the question, try many (noneOf ":")
. This will consume any character until it sees a ':'
, then stop.
Edit: Seems I had not understood the question. You can use the try
combinator to turn a parser which may consume some characters before failing into one that consumes no characters on a failure. So:
skipUntil p = try p <|> (anyChar >> skipUntil p)
Beware that this can be quite expensive, both in runtime (because it will try matching p
at every position) and memory (because try
prevents p
from consuming characters and so the input cannot be garbage collected at all until p
completes). You might be able to alleviate the first of those two problems by parameterizing the anyChar
bit so that the caller could choose some cheap parser for finding candidate positions; e.g.
skipUntil p skipper = try p <|> (skipper >> skipUntil p skipper)
You could then potentially use the above many (noneOf ":")
construction to only try p
on positions that start with a :
.