Search code examples
parsingrecursionclojure

Avoiding stack overflow from recursion in Clojure


I'm new to Clojure and am having trouble figuring out how to avoid stack overflows in certain situations. One such situation came up while trying to port a parsing project to Clojure with a parser combinator library I found called kern.

Kern defines a recursive implementation for a "many-till" parser: source

This works fine for small inputs:

(def input-text "The blue {cat} and the red {dog} became best friends with the white {wolf} END {not included}")

(def  between-brackets "parser that grabs all text between brackets"
  (between (sym* \{) (sym* \}) (<+> (many (none-of* "}")))))

(def parse-enclosed-words "parser that attempts to grab text between brackets, 
skips a character when it can't, and halts when it hits the string END"
  (many-till (<|> between-brackets (skip any-char)) (token* "END")))

(filter #(some? %) (value parse-enclosed-words input-text)) ;; => ("cat" "dog" "wolf")

Unfortunately, the parser encounters stack overflows as input strings grow:

(def file-input-text (slurp (io/resource "[input-text-20x-repeated.txt][2]") ))
(filter #(some? %) (value parse-enclosed-words file-input-text)) ;; => Unhandled java.lang.StackOverflowError

From what I can tell by reading online, this is probably due to the function using stack-consuming recursion. I played around with some attempts to re-write the function using the "recur" keyword, but since the recursive call is not in tail position this didn't seem to work.

How can many-till be modified to avoid stack overflow?


Solution

  • I don't think you can do it with the abstractions the library provides. This is a very natural definition of many-till, and would work fine in Parsec, the library that Kern is obviously inspired by. But Clojure doesn't have Haskell's lazy evaluation and automatic trampolining, so the nested lambdas that many-till builds inevitably consume unbounded stack. You would need an implementation more like its implementation of many, which builds a parser by hand out of a function. I would include its source below, but I don't own it and so I don't think I am authorized to give Stack Overflow a CC BY-SA 4.0 license to it, as posting it would do. Instead, here is a link to its source.