Search code examples
performanceclojure

What are the performance bottlenecks in this code?


I have the following Clojure code:

(ns smallest-sum.core)

(defn next-transformation
  [arr]
  (loop [i 0
         j 0]
    (let [
          arr-len (count arr)
          ]
      (if (and (< i arr-len)
               (< j arr-len))
        (let [
                xi (nth arr i)
                xj (nth arr j)
                j-plus-1 (+ j 1)
                i-plus-1 (+ i 1)
                new-i (if (< j-plus-1 arr-len)
                       i
                       (+ i 1))
                new-j (if (< j-plus-1 arr-len)
                       (+ j 1)
                       0)
              ]
          (if (> xi xj)
              ;; We found it
              [i j] 
              ;; We haven't found it, recur 
              (recur new-i new-j)
            )
          )
          nil ; We are at the end of the  loop
        ) ; if 
      )
    ) ; loop 
  ) ; defn

(defn solution
  [arr]
  (loop [state {
                :arr arr 
                :sum 0
                }]
      (let [
            cur-arr (get state :arr) 
            trx (next-transformation cur-arr) 
            ]
         ; (println (str "========"))
         ; (println (str "cur-arr: " cur-arr))
         (if (not (= trx nil))
           ;; trx is not nil -- recur
           (let [
                  i (nth trx 0)
                  j (nth trx 1)
                  xi (nth cur-arr i)
                  xj (nth cur-arr j)
                  diff (- xi xj) 
                 ]
              (recur (assoc state :arr (assoc cur-arr
                                              i
                                              diff
                                 )
                           )
                    )            
             )
           ;; Here we need the sum
           (reduce + cur-arr)
           )
        )   
    )
)

This code must be able to process a large input (example see below) within 120000 milliseconds.

I assume that one problem are two loops (in solution and next-transformation) which can be unified into one.

Are there any other performance bottlenecks in this code?

Here is the test that fails because solution takes more than 120000 milliseconds to run:

(ns smallest-sum.test
  (:require [smallest-sum.core :refer :all]
            [clojure.test :refer :all]))
            
(deftest sample-test-cases
  (is (< 0 (solution [
             91293
             38437
             40626
             173
             76990
             17858
             43446
             25050
             10791
             68990
             52403
             21503
             52331
             51909
             73488
             91293
             38437
             40626
             173
             76990
             17858
             43446
             25050
             10791
             68990
             52403
             21503
             52331
             51909
             73488
             91293
             38437
             40626
             173
             76990
             17858
             43446
             25050
             10791
             68990
             52403
             21503
             52331
             51909
             73488
             91293
             38437
             40626
             173
             76990
             17858
             43446
             25050
             10791
             68990
             52403
             21503
             52331
             51909
             73488
                      ])  ))       
)

Update: After reading the answers as well as this question I rewrote the code as follows. Now it seems to work (is fast enough).

(defn gcd
  [a b]
  (if (= b 0)
    a
    (gcd b (mod a b)))
  )

(defn gcd-arr
  [arr]
  (reduce gcd arr) 
  )

(defn solution
  [arr]
  (let [
        greatest-common-divisor (gcd-arr arr)
        arr-len (count arr)
        ]
      (* arr-len greatest-common-divisor) 
    )
  )

Solution

  • I find this code hard to reason about. If you could provide a description of what the code is attempting to do, it might be possible to give better suggestions.

    But based on what I see running these two functions solution takes a vector and decreases the values somehow until they are all the same value. and then returns the size of the vector times that value that is the same. (Or the sum of the reduced values) next-transformation picks out two indexes from the vector that have different values, that solution then reduces the larger by the difference.


    One of the indexes returned by next-transformation is always zero.

    The result of next-transformation is:

    1. nil when all the elements in the vector are the same.
    2. [0 first-index-of-element-less-than-first] when one or more elements of the vector are less than the first element.
    3. else [first-index-of-element-greater-than-first 0].

    The above results can be calculated in a single pass instead of a nested loop. Something like:

    (defn nt-1 [arr]
      (let [f (first arr)]
        (loop [i 1
               bigger nil]
          (cond (>= i (count arr)) (if (nil? bigger) 
                                     nil 
                                     [bigger 0])
                (> f (arr i)) [0 i]
                :else (recur (inc i)
                             (cond (not (nil? bigger)) bigger
                                   (< f (arr i)) i
                                   :else nil))))))
    

    I don't understand solution well enough if the following would be correct, but another possibility may be to exit with the first element that is not equal:

    (defn nt-2 [arr]
      (let [f (first arr)]
        (loop [i 1]
          (cond (>= i (count arr)) nil
                (< f (arr i)) [i 0]
                (> f (arr i)) [0 i]
                :else (recur (inc i))))))
    

    Timing:

    ;; Using original next-transformation
    user> (time (solution (into [] (repeat 100000 10000))))
    "Elapsed time: 269826.483125 msecs"
    1000000000
    user> (def next-transformation nt-1)
    #'user/next-transformation
    user> (time (solution (into [] (repeat 1000000 10000))))
    "Elapsed time: 116.421625 msecs"
    10000000000
    user> (def next-transformation nt-2)
    #'user/next-transformation
    user> (time (solution (into [] (repeat 1000000 10000))))
    "Elapsed time: 87.211333 msecs"
    10000000000
    

    Using your test data:

    ;; Original next-transformation
    user> (time (solution [91293 38437 40626 173 76990 17858 43446 25050 10791
                     68990 52403 21503 52331 51909 73488 91293 38437 40626
                     173 76990 17858 43446 25050 10791 68990 52403 21503
                     52331 51909 73488 91293 38437 40626 173 76990 17858
                     43446 25050 10791 68990 52403 21503 52331 51909 73488
                     91293 38437 40626 173 76990 17858 43446 25050 10791
                     68990 52403 21503 52331 51909 73488]))
    "Elapsed time: 112973.434458 msecs"
    60
    user> (def next-transformation nt-1)
    #'user/next-transformation
    user> (time (solution [91293 38437 40626 173 76990 17858 43446 25050 10791
                     68990 52403 21503 52331 51909 73488 91293 38437 40626
                     173 76990 17858 43446 25050 10791 68990 52403 21503
                     52331 51909 73488 91293 38437 40626 173 76990 17858
                     43446 25050 10791 68990 52403 21503 52331 51909 73488
                     91293 38437 40626 173 76990 17858 43446 25050 10791
                     68990 52403 21503 52331 51909 73488]))
    "Elapsed time: 1630.157375 msecs"
    60
    

    Zero and negative numbers seemed to lead to infinite loop

    In the case of at least one positive number and at least one non-positive number solution appeared to enter an infinite loop. I am assuming the data being used are positive integers.


    Once (arr 0) is one, the result is (count arr).

    My gut says that usually one of the elements will be reduced to one, leading to the result to be (count arr) most of the time. My gut also says that is an uninteresting function to mostly, but not always, return (count arr) so I might be wrong.

    In the cases where the result is (count arr) exiting as soon as (= 1 (arr 0)) would provide a performance boost.

    (defn s-1 [arr]
      (loop [state {:arr arr :sum 0}]
          (let [cur-arr (get state :arr) 
                trx (next-transformation cur-arr)]
            (cond (nil? trx) (* (count cur-arr)
                                (nth cur-arr 0 0))
                  (= 1 (first cur-arr)) (count arr)
                  :else (let [i (nth trx 0)
                              j (nth trx 1)
                              xi (nth cur-arr i)
                              xj (nth cur-arr j)
                              diff (- xi xj)]
                          (recur (assoc state :arr (assoc cur-arr
                                                          i
                                                          diff))))))))
    

    Timing:

    user> (def solution s-1)
    ;; next-transformation has been restored to original
    user> (time (solution [91293 38437 40626 173 76990 17858 43446 25050 10791
                     68990 52403 21503 52331 51909 73488 91293 38437 40626
                     173 76990 17858 43446 25050 10791 68990 52403 21503
                     52331 51909 73488 91293 38437 40626 173 76990 17858
                     43446 25050 10791 68990 52403 21503 52331 51909 73488
                     91293 38437 40626 173 76990 17858 43446 25050 10791
                     68990 52403 21503 52331 51909 73488]))
    "Elapsed time: 9.182584 msecs"
    60
    #'user/next-transformation
    user> (time (solution [91293 38437 40626 173 76990 17858 43446 25050 10791
                     68990 52403 21503 52331 51909 73488 91293 38437 40626
                     173 76990 17858 43446 25050 10791 68990 52403 21503
                     52331 51909 73488 91293 38437 40626 173 76990 17858
                     43446 25050 10791 68990 52403 21503 52331 51909 73488
                     91293 38437 40626 173 76990 17858 43446 25050 10791
                     68990 52403 21503 52331 51909 73488]))
    "Elapsed time: 2.379 msecs"
    60
    

    Cleaning up solution, per my tastes:

    ;; :sum is never used. So there is no reason
    ;; to have `state` with two entries.
    ;; (Not that there ever was:
    ;;    `(loop [arr arr sum 0]...)`)
    ;; would have been fine.
    ;; With only `arr` changing, we can
    ;; recur to the function top, without
    ;; `loop`
    (defn s-2 [arr]
      (let [trx (next-transformation arr)]
        (cond (nil? trx) (* (count arr)
                            (nth arr 0 0)) ; Last `0` provides a default
              (= 1 (first arr)) (count arr)
              :else (let [[i j] trx ;destructuring
                          diff (- (arr i) (arr j))]
                      (recur (assoc arr i diff))))))
    

    Timing:

    user> (def solution s-2)
    #'user/solution
    user> (time (solution [91293 38437 40626 173 76990 17858 43446 25050 10791
                     68990 52403 21503 52331 51909 73488 91293 38437 40626
                     173 76990 17858 43446 25050 10791 68990 52403 21503
                     52331 51909 73488 91293 38437 40626 173 76990 17858
                     43446 25050 10791 68990 52403 21503 52331 51909 73488
                     91293 38437 40626 173 76990 17858 43446 25050 10791
                     68990 52403 21503 52331 51909 73488]))
    "Elapsed time: 2.116625 msecs"
    60
    

    Using time once is not great benchmarking, but given the size of the differences seen I believe it is still directional in this case.