Search code examples
haskellparsecmegaparsec

How far does "try" back track?


So ... I messed up a recording in CSV format:

23,95489,0,20,9888

Due to language settings floating point numbers were written with commas as seperator ... in a comma separated value file ...

Problem is that the file does not have a nice formatting for every float. Some have no point at all and the number of numbers behind the point varies too.

My idea was to build a MegaParsec parser that would try to read every possible floating point formatting, move on and if back track if it finds an error.

Eg for the example above:

  1. read 23,95489 -> good
  2. read 0,20 -> good (so far)
  3. read 9888 -> error (because value is too high for column (checked by guard))
  4. (back tracking to 2.) read 0 -> good again
  5. read 20,9888 -> good
  6. done

I've implemented that as (pseudo code here):

floatP = try pointyFloatP <|> unpointyFloatP

lineP = (,,) <$> floatP <* comma <*> floatP <* comma <*> floatP <* comma

My problem is that apparently the try only works in the 'current' float. There is no backtracking to previous positions. Is this correct?

And if so ... how would I go about implementing further back tracking?


Solution

  • How far does “try” back track?

    The parser try p consumes exactly as much input as p if p parses successfully, otherwise it does not consume any input at all. So if you look at that in terms of backtracking, it backtracks to the point where you were when you invoked it.

    My problem is that apparently the try only works in the 'current' float. There is no backtracking to previous positions. Is this correct?

    Yes, try does not "unconsume" input. All it does is to recover from a failure in the parser you give it without consuming any input. It does not undo the effects of any parsers that you've applied previously, nor does it affect subsequent parsers that you apply after try p succeeded.

    And if so ... how would I go about implementing further back tracking?

    Basically what you want is to not only know whether pointyFloatP succeeds on the current input, but also whether the rest of your lineP would succeed after successfully pointyFloatP - and if it doesn't you want to backtrack back to before you applied pointyFloatP. So basically you want the parser for the whole remaining line in the try, not just the float parser.

    To achieve that you can make floatP take the parser for the remaining line as an argument like this:

    floatP restP = try (pointyFloatP <*> restP) <|> unpointyFloatP <*> restP
    

    Note that this kind of backtracking isn't going to be very efficient (but I assume you knew that going in).