My question is inspired by this one, but for javascript, using the parsimmon parser-combinator library. I want to parse an indentation-sensitive language, like python or yaml.
I've managed to convert the scala example in that answer to javascript easily enough - the key is the chain
function in parsimmon, which is equivalent to the >>
operator in scala's parser combinators - they both take a parser and a function that returns a parser, and the result of the first parser is passed to the function to choose the next parser.
However, I can't quite wrap my head around how to make this recursive. The example is for a single block - I don't see how to create nested blocks, tracking de-dent levels as needed to parse something like python.
Well, here's one way to do it. It's probably not the best, and it's definitely not intuitive (I'm not sure I understand why it works, and I wrote it :) but it seems to be pretty robust.
Basically it says: a tree
is a line
optionally followed by a block
. A block
, in turn, is an indented list of tree
s.
indent
is a function that takes the current indentation level, and returns a parser that expects a line indented more than that. The returned parser returns a stack of previous indentation levels.
I said before it's robust - in fact, it's too robust. It accepts input that really ought to throw an error instead: if you un-indent to an indentation level that doesn't match a previous level, it basically "rounds up" to the next indentation level. I'm not sure where the logic to fix that should go - mutual recursion mixed with parser "chain"-ing is hard to follow!
var {regex, seq} = Parsimmon;
function tree(state) {
return seq(
line,
block(state).atMost(1).map(x => x[0]? x[0] : [])
);
}
function block(state) {
return indent(state)
.chain(tree).atLeast(1);
}
function indent(state) {
return regex(/\s/).atLeast(state + 1)
.map(x => x.length)
.desc('indent');
}
let item = regex(/[^\s].*/).desc('item');
let line = item.skip(regex(/\n?/));
let start = block(-1);
let result = start.parse('top item 1\n sub item 1\n sub item 2\n' +
' even deeper\n sub item 3\ntop item 2');
console.log(JSON.stringify(result['value'], null, 2));
<script src="https://cdn.rawgit.com/jneen/parsimmon/master/src/parsimmon.js"></script>