Search code examples
clojurecore.async

Why do I have memory leak for the following code with channel sub/unsub?


I am using [org.clojure/clojure "1.10.1"],[org.clojure/core.async "1.2.603"] and the latest Amazon Corretto 11 JVM if there was anything to do with them.

The following code is a simplified version of the code used in production and it does cause memory leak. I have no idea why that happened but suspect it might due to sub/unsub of channels. Can anyone help point out where my code may go wrong or how I can fix the memory leak?

(ns test-gc.core
  (:require [clojure.core.async :as a :refer [chan put! close! <! go >! go-loop timeout]])
  (:import [java.util UUID]))

(def global-msg-ch (chan (a/sliding-buffer 200)))

(def global-msg-pub (a/pub global-msg-ch :id))

(defn io-promise []
  (let [id (UUID/randomUUID)
        ch (chan)]
    (a/sub global-msg-pub id ch)
    [id (go
          (let [x (<! ch)]
            (a/unsub global-msg-pub id ch)
            (:data x)))]))

(defn -main []
  (go-loop []
    (<! (timeout 1))
    (let [[pid pch] (io-promise)
          cmd {:id   pid
               :data (rand-int 1E5)}]
      (>! global-msg-ch cmd)
      (println (<! pch)))
    (recur))
  (while true
    (Thread/yield)))

A quick heap dump gives the following statistics for example:

  • Class by number of instances

    • java.util.LinkedList 5,157,128 (14.4%)
    • java.util.concurrent.atomic.AtomicReference 3,698,382 (10.3%)
    • clojure.lang.Atom 3,094,279 (8.6%)
    • ...
  • Class by size of instances

    • java.lang.Object[] 210,061,752 B (13.8%)
    • java.util.LinkedList 206,285,120 B (13.6%)
    • clojure.lang.Atom 148,525,392 B (9.8%)
    • clojure.core.async.impl.channels.ManyToManyChannel 132,022,336 B (8.7%)
    • ...

Solution

  • I finally figured out why. By looking at the source code, we get the following segment:

    (defn pub
      "Creates and returns a pub(lication) of the supplied channel, ..."
      ...
         (let [mults (atom {}) ;;topic->mult
               ensure-mult (fn [topic]
                             (or (get @mults topic)
                                 (get (swap! mults
                                             #(if (% topic) % (assoc % topic (mult (chan (buf-fn topic))))))
                                      topic)))
               p (reify
                  Mux
                  (muxch* [_] ch)
    
                  Pub
                  (sub* [p topic ch close?]
                        (let [m (ensure-mult topic)]
                          (tap m ch close?)))
                  (unsub* [p topic ch]
                          (when-let [m (get @mults topic)]
                            (untap m ch)))
                  (unsub-all* [_] (reset! mults {}))
                  (unsub-all* [_ topic] (swap! mults dissoc topic)))]
           ...
           p)))
    

    We can see mults stores all topic hence shall increase monotonically if we do not clear it. We may add something like (a/unsub-all* global-msg-pub pid) to fix that.