Search code examples
clojure

Alternatives of mutation in this scenario?


I wrote a computationally-intensive function (get-cand-info below) that is going to be called from pre-existing clojure code written by others.

(defn get-cand-info [model tuple]  ; my code which operates on 'tuple' and a hash-map called 'model'
 ; ....
   cand-info)

;; how my code get-cand-info is going to be called
(defn get-cand-scores [model]
  (let [tuples   (make-tuples model)]    
    (filter identity
        (pmap #(get-cand-info model %) tuples))))

(defn select-cand [model]
  (let [cands-with-scores   (get-cand-scores model)]
    ; Logic to work on cand-with-scores, finally returns one of
    ; the cand-info but not the model
    ))

After writing the new get-cand-info function I realized that it produces identical result hundreds of times for a end-user session, which is really a waste of resources.

Naturally, I was inclined to consider memoize but didn't want to incur the increased memory usage for the entire life of the program; across all user-sessions it can be a lot of unique data in the cache and the data from one user-session is not valid for another user-session anyway. The 'model' parameter to my function seemed the perfect place to cache the result of get-cand-info as it stores data for one session.
However, if I return an updated model from my function it changes the contract of what my function returns. If I do modify the contract to return a new 'model' map with the new result assoc'ed into it, I would need to update code all the way up the call stack - which means changes to a lot of functions and something I want to avoid.

So I decided to change the model and mutate it in my node:

(defn get-cand [model tuple]  
  ; Fetch the cand-info from the model if available there
  (if-let [cand-info   ((deref (:cand-info model)) tuple)]
     cand-info
     ; Else calculate the cand-info, 
     ; ....
     ;store it in the model and return it
     (do
       (swap! (:cand-info model) assoc tuple cand-info)
       cand-info) ))

This does the job but leaves me wondering

1) Whether there's a better, more clojurey way of solving the problem?

2) Is the mutation likely to result in any performance penalty or other drawback? (I don't have large datasets yet to test the performance).

Would appreciate any insights/comments.

P.S. The user-sessions are typically no longer than 5 minutes, and the size of data to be stored in get-cand-info per session would be under 200 MB which can be GCed as soon as the session is over.


Solution

  • I would do it just like you propose. No need to use dosync & alter with a ref for this. Just use a local atom within each model. It can then be GC'd when the model is no longer being used.


    Update

    One alternative from Java is to used a LinkedHashMap. You can set a max size and override the removeEldestEntry() function to control behavior.


    Note that this syntax is slightly off, but I'm sure you know how to fix it:

    ((deref...