Search code examples
parsinghaskelltrifecta

How do I skip specified symbols while parsing


I am trying to write a trifecta parser that can parse all three of the phone numbers below. When I try to use parsePhone by calling parseString parsePhone mempty phoneNum2, the parser fails at the first dash and says it expected '('.

When I call the parser on phoneNum1 it fails at ')' , saying it expected '('.

Why is my skipSymbol parser failing? I would think that due to my use of <|>, the parser would be fine with not detecting '(' and move on. Is the technique that I am attempting with skipSymbol bound to fail?

phoneNum1 = "(123) 456 7890"
phoneNum2 = "123-456-7890"
phoneNum3 = "1234567890"

type NumberingPlanArea = Integer
type Exchange = Integer
type LineNumber = Integer

data PhoneNumber =
  PhoneNumber NumberingPlanArea
              Exchange LineNumber
  deriving (Eq, Show)

parse3digits :: Parser Integer
parse3digits = read <$> replicateM 3 digit

skipSymbol :: Parser ()
skipSymbol =
      skipMany (char '(')
  <|> skipMany (char ')')
  <|> skipMany (char '-')
  <|> skipMany (char ' ')

parsePhone :: Parser PhoneNumber
parsePhone =
  skipSymbol >>
  parse3digits >>=
    \area -> skipSymbol >>
    parse3digits >>=
      \exch -> skipSymbol >>
      integer >>=
        \line ->
          pure $ PhoneNumber area exch line

Solution

  • skipMany p applies the parser p zero-or-more times. Operationally, it goes something like this:

    1. Attempt to apply p.
    2. If p succeeded, repeat step 1.
    3. If p failed without consuming input, succeed and return (). (This is what the “zero” in “zero-or-more” means.)
    4. If p failed after consuming some input, report the failure.

    Let's look at how skipSymbol operates on the input ).

    1. Parsec tries the left hand choice of skipSymbol, namely skipMany (char '(').
    2. skipMany (char '(') attempts to apply char '(', which fails without consuming input because the input character is ).
    3. Because char '(' failed without consuming input, skipMany (char '(') succeeds without consuming input. This means the other choices in skipSymbol won't be attempted.
    4. The current input character is still ) (which is what causes parse3Digits to later fail).

    As noted in the comments, the fix is to change the definition of skipSymbol to

    skipSymbol :: Parser ()
    skipSymbol = skipMany $ choice [char c | c <- "()- "]
    

    This version loops a choice, rather than choosing between loops.