Search code examples
clojurespecter

Select elements from a nested structure that match a condition in Clojure


I recently discovered the Specter library that provides data-structure navigation and transformation functions and is written in Clojure.

Implementing some of its API as a learning exercise seemed like a good idea. Specter implements an API taking a function and a nested structure as arguments and returns a vector of elements from the nested structure that satisfies the function like below:

(select (walker number?) [1 :a {:b 2}]) => [1 2]

Below is my attempt at implementing a function with similar API:

(defn select-walker [afn ds]
  (vec (if (and (coll? ds) (not-empty ds))
         (concat (select-walker afn (first ds)) 
                 (select-walker afn (rest ds)))
         (if (afn ds) [ds]))))

(select-walker number? [1 :a {:b 2}]) => [1 2]

I have tried implementing select-walker by using list comprehension, looping, and using cons and conj. In all these cases the return value was a nested list instead of a flat vector of elements.

Yet my implementation does not seem like idiomatic Clojure and has poor time and space complexity.

(time (dotimes [_ 1000] (select (walker number?) (range 100))))
"Elapsed time: 19.445396 msecs"

(time (dotimes [_ 1000] (select-walker number? (range 100))))
"Elapsed time: 237.000334 msecs"

Notice that my implementation is about 12 times slower than Specter's implementation.

I have three questions on the implemention of select-walker.

  1. Is a tail-recursive implementaion of select-walker possible?
  2. Can select-walker be written in more idiomatic Clojure?
  3. Any hints to make select-walker execute faster?

Solution

    1. there are at least two possibilities to make it tail recursive. First one is to process data in loop like this:

      (defn select-walker-rec [afn ds]
        (loop [res [] ds ds]
          (cond (empty? ds) res
                (coll? (first ds)) (recur res 
                                          (doall (concat (first ds) 
                                                         (rest ds))))
                (afn (first ds)) (recur (conj res (first ds)) (rest ds))
                :else (recur res (rest ds)))))
      

      in repl:

      user> (select-walker-rec number? [1 :a {:b 2}])
      [1 2]
      
      user> user> (time (dotimes [_ 1000] (select-walker-rec number? (range 100))))
      "Elapsed time: 19.428887 msecs"
      

      (simple select-walker works about 200ms for me)

      the second one (slower though, and more suitable for more difficult tasks) is to use zippers:

      (require '[clojure.zip :as z])
      
      (defn select-walker-z [afn ds]
        (loop [res [] curr (z/zipper coll? seq nil ds)]
          (cond (z/end? curr) res
                (z/branch? curr) (recur res (z/next curr))
                (afn (z/node curr)) (recur (conj res (z/node curr))
                                           (z/next curr))
                :else (recur res (z/next curr)))))
      
      user> (time (dotimes [_ 1000] (select-walker-z number? (range 100))))
      "Elapsed time: 219.015153 msecs"
      

      this one is really slow, since zipper operates on more complex structures. It's great power brings unneeded overhead to this simple task.

    2. the most idiomatic approach i guess, is to use tree-seq:

      (defn select-walker-t [afn ds]
        (filter #(and (not (coll? %)) (afn %))
                (tree-seq coll? seq ds)))
      
      user> (time (dotimes [_ 1000] (select-walker-t number? (range 100))))
      "Elapsed time: 1.320209 msecs"
      

      it is incredibly fast, as it produces a lazy sequence of results. In fact you should realize its data for the fair test:

      user> (time (dotimes [_ 1000] (doall (select-walker-t number? (range 100)))))
      "Elapsed time: 53.641014 msecs"
      

      one more thing to notice about this variant, is that it's not tail recursive, so it would fail in case of really deeply nested structures (maybe i'm mistaken, but i guess it's about couple of thousands levels of nesting), still it's suitable for the most cases.