Search code examples
haskellmegaparsec

How do I convert a MegaParsec "ParseErrorBundle" into a list of "SourcePos" and error messages


Using the MegaParsec parse function, I'm able to run a parser, and get a ParseErrorBundle if it fails.

I know that I'm able to pretty print the ParseErrorBundle, and get an error message for the entire parse failure, which will include the line and character numbers, using errorBundlePretty.

I also know that I'm able to get a list of ParseError's from a ParseErrorBundle, using bundleErrors. And that I can pretty print these with either parseErrorPretty or parseErrorTextPretty.

I want to be able to run a parser, and if it fails, get a list of (SourcePos, Text), so that I know both the individual error messages, and the location of each error. I can't figure out an elegant way to do this. While I could in theory crib fairly heavily from the source code to errorBundlePretty, I feel like folding over the errors and using reachOffset to advance the PosState can't be the easiest way to go about this?.


Solution

  • Note that, if you're using megaparsec >= 7.0.0, I think you're supposed to use attachSourcePos for the traversal. It returns a NonEmpty of (ParseError, SourcePos) pairs. I think it would look like:

    import qualified Text.Megaparsec as MP
    import qualified Data.Text as T
    import Data.List.NonEmpty (NonEmpty (..))
    import Data.Void
    
    annotateErrorBundle :: MP.ParseErrorBundle T.Text Void -> NonEmpty (MP.SourcePos, T.Text)
    annotateErrorBundle bundle
      = fmap (\(err, pos) -> (pos, T.pack . MP.parseErrorTextPretty $ err)) . fst $
        MP.attachSourcePos MP.errorOffset
                           (MP.bundleErrors bundle)
                           (MP.bundlePosState bundle)
    

    Note that unlike your proposed answer, attachSourcePos threads the PosState properly through the traversal of the error bundle, rather than throwing the updated state away after every reachOffset call. As a result, I believe it will be more efficient for a large number of errors. (It also uses reachOffsetNoLine instead of reachOffset which may be more efficient for certain stream types.

    If you're using a megaparsec < 7.0.0, you might want to try to adapt the source for attachSourcePos from later versions.