clojure immutable binary search tree insertion

I'm currently trying to implement an immutable BST in clojure. This is my make-tree function:

(defn make-tree [v] {:v v :l nil :r nil})

and insertion:

(defn insert [tree v]
  (if (nil? tree)
    (make-tree v)
    (case (compare v (tree :v))
      -1 (assoc tree :l (insert (:l tree) v))
      0 tree
      1 (assoc tree :r (insert (:r tree) v)))))

The thing is, this insert function will overflow with something like

(reduce insert (make-tree 1) (range 10000))

I understand that I can have the tree balanced so that I will hardly need more than 1000 depth. I'm still curious if there is anyway to define the function so it won't overflow.

Since the mutable version just modify the node thus no need to store parent nodes， which seems convenient.

What will you chose in real life? mutable or immutable?

Solution

To add to A. Webb's answer:

While a purely functional implementation would need to be written in CPS, you could implement the trees as a functional wrapper + mutable nodes, where a modification would proceed as follows:

Wrapper creates a new empty root and tells the current root to perform the modification and put the results in the new root.
If new root determines its own value is to be modified, it copies its children to the new root, puts the new value in the new root and returns.
Otherwise it copies its value and the branch which needs no modification to the new root and installs a new (blank) node in the new root in place of the branch which does (or rather, may) require modification.
The current root tells the branch in need of modification to modify itself, also handing it the new blank node.
The branch acts as the root in repeating steps 2-5.
The wrapper creates a new wrapper with the new root. The new wrappers is returned as the result of the top-level operation.

The points 2-5 above describe a recursive procedure, but it's actually tail-recursive, so can be rewritten as a loop.

After it's all done, the old wrapper is, of course, perfectly fine and still holds the same tree (since all the mutations involved the new nodes only).

In fact, many of the Clojure data structures use contained mutability (mostly involving arrays) all the time. (Not the tree map, though; the actual pattern of using mutability is different, too, but the basic premise of using contained mutability to speed things up on the inside while maintaining a functional interface on the outside is similar.)

I'll also add two tangential remarks:

Firstly, your implementation assumes compare will return one of -1, 0, 1, where in fact it's free to return any negative number for "less than" and any positive number for "greater than" ((compare "foo" "bar") evaluates to 4 at my REPL -- and likely any JVM REPL, although the exact value is, again, unspecified).

Secondly, if you're interested in real life examples of self-balancing trees implemented in Clojure, you might want to have a look at sorted.clj, which is an implementation of Clojure's PersistentTreeMap and PersistentTreeSet in Clojure. It's based on the ClojureScript implementation, which in turn is a port of Clojure's Java implementation. At this point, I'd say it's purely experimental¹, but it does seem to work (passes the CLJS test suite, does what I'd expect at the REPL) and may be useful as an example of what a simple PDS implementation might look like in Clojure.

¹ Basically after writing the ClojureScript implementation I wanted to see how much extra work it would take to produce a Clojure implementation of the same data structure. It's clear there will be some extra work, as there are Java interfaces to be implemented, some array handling code needs to be tweaked etc., but I was hoping it wouldn't add up to too much. I'm happy to report that that basic expectation was borne out by experience. Performance-wise it's not quite up to the standard of the Java implementation (I seem to remember a 1.5x slowdown in most benchmarks, though I'll have to recheck that); I'm hoping to improve that eventually, though core.rrb-vector is a higher priority for perf tuning at this time.