Search code examples
clojurejvmout-of-memory

Getting unexpected sequence realization in Clojure


Noticed something odd while trying to troubleshoot a complex computation. I had an odd error so I started to build up the computation incrementally and pretty early on discovered that a very very long seq was being unexpectedly realized. I have a combinatoric seq of lists like so:

(0 1 2 3)
(0 1 2 4)
(0 1 2 5)

for n-choose-k for n = 143 and k = 4 somewhat unoriginally named "combos". I plan to manipulate this as a seq to keep the memory consumption reasonable but this failed:

(def combos (combinations 4 143))
(def semantically-also-combos
  (filter nil? (map identity combos)))

;; this prints instantly and uses almost no memory, as expected
(println (first combos)) ; prints (0 1 2 3)

;; this takes minutes and runs the JVM out of memory
;; without printing anything
(println (first semantically-also-combos))

According to type they are both clojure.lang.LazySeq but one works as expected and the other crashes the process. Why does the entire seq get realized just to run it through the identity function and check if it's nil?

Complete code to reproduce

(ns my-project.core
  (:gen-class))

;;; Copied from rosetta code
(defn combinations
  "If m=1, generate a nested list of numbers [0,n)
   If m>1, for each x in [0,n), and for each list in the recursion on [x+1,n), cons the two"
  [m n]
  (letfn [(comb-aux
            [m start]
            (if (= 1 m)
              (for [x (range start n)]
                (list x))
              (for [x (range start n)
                    xs (comb-aux (dec m) (inc x))]
                (cons x xs))))]
    (comb-aux m 0)))

(println "Generating combinations...")
(def combos (combinations 4 143))

(def should-also-be-combos
  (filter nil? (map identity combos)))

(defn -main
  "Calculates combos"
  [& _args]
  (println (type combos))
  (println (type should-also-be-combos)))

Solution

  • Why don't you check that (filter nil? (map identity combos)) actually returns the result you would expect for much smaller arguments passed to combinations? I did that. This is what combinations returns with arguments 2 and 5:

    (combinations 2 5)
    ;; => ((0 1) (0 2) (0 3) (0 4) (1 2) (1 3) (1 4) (2 3) (2 4) (3 4))
    

    This is what we get with the extra lazy sequence operations of the example:

    (filter nil? (map identity (combinations 2 5)))
    ;; => ()
    

    What (filter nil? ...) does is to keep all elements that satisfy the predicate. And none of the elements in the input sequence is nil?, so it will scan the entire input sequence and will find none. Maybe you wanted to use (remove nil? ...), which will remove elements satisfying a predicate?

    (remove nil? (map identity (combinations 2 5)))
    ;; => ((0 1) (0 2) (0 3) (0 4) (1 2) (1 3) (1 4) (2 3) (2 4) (3 4))
    

    Going back to the original example, this is what I get using remove:

    (first (combinations 4 143))
    ;; => (0 1 2 3)
    
    (first (remove nil? (map identity (combinations 4 143))))
    ;; => (0 1 2 3)
    

    My general advice is to start testing your function with "smaller" data such as 2 and 5, before going to "bigger" data such as 4 and 143.