Search code examples
haskellmegaparsec

How to get source range of AST nodes using megaparsec?


I'm trying to generate a source map for some source file I'm parsing and I want to get the range for each node. getSourcePos only gives the start position of a node (src:line:column). How to get its end position?


Solution

  • If you want to construct a source span like this for each lexeme:

    data Span = Span SourcePos SourcePos
    
    data Spanned a = Spanned Span a
    

    You can just call getSourcePos twice, once at the beginning of a token and once at the end, before consuming any whitespace, assuming you’re at the lexing stage. I’ve used a structure like this in the past to make this more convenient:

    -- Augment a parser with a source span.
    spanned :: Parser (a, SourcePos) -> Parser (Spanned a)
    spanned parser = do
      start <- getSourcePos
      (x, end) <- parser
      pure (Spanned (Span start end) x)
    
    -- Consume whitespace following a lexeme, but record
    -- its endpoint as being before the whitespace.
    lexeme :: Parser a -> Parser (a, SourcePos)
    lexeme parser = (,) <$> parser <*> (getSourcePos <* whitespace)
    

    Bearing in mind that getSourcePos is somewhat costly, per the documentation, and if I recall correctly this depends on source file size.

    If an AST is annotated with spans, you can compute the span of any part of the tree by folding over it with a monoid instance for Span that takes their union (or more specifically their bounding box), i.e. a <> b is a span from (beginRow a, beginCol a) `min` (beginRow b, beginCol b) to (endRow a, endCol a) `max` (endRow b, endCol b).