I have the following Clojure code:
(ns smallest-sum.core)
(defn next-transformation
[arr]
(loop [i 0
j 0]
(let [
arr-len (count arr)
]
(if (and (< i arr-len)
(< j arr-len))
(let [
xi (nth arr i)
xj (nth arr j)
j-plus-1 (+ j 1)
i-plus-1 (+ i 1)
new-i (if (< j-plus-1 arr-len)
i
(+ i 1))
new-j (if (< j-plus-1 arr-len)
(+ j 1)
0)
]
(if (> xi xj)
;; We found it
[i j]
;; We haven't found it, recur
(recur new-i new-j)
)
)
nil ; We are at the end of the loop
) ; if
)
) ; loop
) ; defn
(defn solution
[arr]
(loop [state {
:arr arr
:sum 0
}]
(let [
cur-arr (get state :arr)
trx (next-transformation cur-arr)
]
; (println (str "========"))
; (println (str "cur-arr: " cur-arr))
(if (not (= trx nil))
;; trx is not nil -- recur
(let [
i (nth trx 0)
j (nth trx 1)
xi (nth cur-arr i)
xj (nth cur-arr j)
diff (- xi xj)
]
(recur (assoc state :arr (assoc cur-arr
i
diff
)
)
)
)
;; Here we need the sum
(reduce + cur-arr)
)
)
)
)
This code must be able to process a large input (example see below) within 120000 milliseconds.
I assume that one problem are two loops (in solution
and next-transformation
) which can be unified into one.
Are there any other performance bottlenecks in this code?
Here is the test that fails because solution
takes more than 120000 milliseconds to run:
(ns smallest-sum.test
(:require [smallest-sum.core :refer :all]
[clojure.test :refer :all]))
(deftest sample-test-cases
(is (< 0 (solution [
91293
38437
40626
173
76990
17858
43446
25050
10791
68990
52403
21503
52331
51909
73488
91293
38437
40626
173
76990
17858
43446
25050
10791
68990
52403
21503
52331
51909
73488
91293
38437
40626
173
76990
17858
43446
25050
10791
68990
52403
21503
52331
51909
73488
91293
38437
40626
173
76990
17858
43446
25050
10791
68990
52403
21503
52331
51909
73488
]) ))
)
Update: After reading the answers as well as this question I rewrote the code as follows. Now it seems to work (is fast enough).
(defn gcd
[a b]
(if (= b 0)
a
(gcd b (mod a b)))
)
(defn gcd-arr
[arr]
(reduce gcd arr)
)
(defn solution
[arr]
(let [
greatest-common-divisor (gcd-arr arr)
arr-len (count arr)
]
(* arr-len greatest-common-divisor)
)
)
I find this code hard to reason about. If you could provide a description of what the code is attempting to do, it might be possible to give better suggestions.
But based on what I see running these two functions solution
takes a vector and decreases the values somehow until they are all the same value. and then returns the size of the vector times that value that is the same. (Or the sum of the reduced values) next-transformation
picks out two indexes from the vector that have different values, that solution
then reduces the larger by the difference.
One of the indexes returned by next-transformation is always zero.
The result of next-transformation
is:
nil
when all the elements in the vector are the same.[0 first-index-of-element-less-than-first]
when one or more elements of the vector are less than the first element.[first-index-of-element-greater-than-first 0]
.The above results can be calculated in a single pass instead of a nested loop. Something like:
(defn nt-1 [arr]
(let [f (first arr)]
(loop [i 1
bigger nil]
(cond (>= i (count arr)) (if (nil? bigger)
nil
[bigger 0])
(> f (arr i)) [0 i]
:else (recur (inc i)
(cond (not (nil? bigger)) bigger
(< f (arr i)) i
:else nil))))))
I don't understand solution
well enough if the following would be correct, but another possibility may be to exit with the first element that is not equal:
(defn nt-2 [arr]
(let [f (first arr)]
(loop [i 1]
(cond (>= i (count arr)) nil
(< f (arr i)) [i 0]
(> f (arr i)) [0 i]
:else (recur (inc i))))))
Timing:
;; Using original next-transformation
user> (time (solution (into [] (repeat 100000 10000))))
"Elapsed time: 269826.483125 msecs"
1000000000
user> (def next-transformation nt-1)
#'user/next-transformation
user> (time (solution (into [] (repeat 1000000 10000))))
"Elapsed time: 116.421625 msecs"
10000000000
user> (def next-transformation nt-2)
#'user/next-transformation
user> (time (solution (into [] (repeat 1000000 10000))))
"Elapsed time: 87.211333 msecs"
10000000000
Using your test data:
;; Original next-transformation
user> (time (solution [91293 38437 40626 173 76990 17858 43446 25050 10791
68990 52403 21503 52331 51909 73488 91293 38437 40626
173 76990 17858 43446 25050 10791 68990 52403 21503
52331 51909 73488 91293 38437 40626 173 76990 17858
43446 25050 10791 68990 52403 21503 52331 51909 73488
91293 38437 40626 173 76990 17858 43446 25050 10791
68990 52403 21503 52331 51909 73488]))
"Elapsed time: 112973.434458 msecs"
60
user> (def next-transformation nt-1)
#'user/next-transformation
user> (time (solution [91293 38437 40626 173 76990 17858 43446 25050 10791
68990 52403 21503 52331 51909 73488 91293 38437 40626
173 76990 17858 43446 25050 10791 68990 52403 21503
52331 51909 73488 91293 38437 40626 173 76990 17858
43446 25050 10791 68990 52403 21503 52331 51909 73488
91293 38437 40626 173 76990 17858 43446 25050 10791
68990 52403 21503 52331 51909 73488]))
"Elapsed time: 1630.157375 msecs"
60
Zero and negative numbers seemed to lead to infinite loop
In the case of at least one positive number and at least one non-positive number solution
appeared to enter an infinite loop. I am assuming the data being used are positive integers.
Once (arr 0) is one, the result is (count arr).
My gut says that usually one of the elements will be reduced to one, leading to the result to be (count arr)
most of the time. My gut also says that is an uninteresting function to mostly, but not always, return (count arr)
so I might be wrong.
In the cases where the result is (count arr)
exiting as soon as (= 1 (arr 0))
would provide a performance boost.
(defn s-1 [arr]
(loop [state {:arr arr :sum 0}]
(let [cur-arr (get state :arr)
trx (next-transformation cur-arr)]
(cond (nil? trx) (* (count cur-arr)
(nth cur-arr 0 0))
(= 1 (first cur-arr)) (count arr)
:else (let [i (nth trx 0)
j (nth trx 1)
xi (nth cur-arr i)
xj (nth cur-arr j)
diff (- xi xj)]
(recur (assoc state :arr (assoc cur-arr
i
diff))))))))
Timing:
user> (def solution s-1)
;; next-transformation has been restored to original
user> (time (solution [91293 38437 40626 173 76990 17858 43446 25050 10791
68990 52403 21503 52331 51909 73488 91293 38437 40626
173 76990 17858 43446 25050 10791 68990 52403 21503
52331 51909 73488 91293 38437 40626 173 76990 17858
43446 25050 10791 68990 52403 21503 52331 51909 73488
91293 38437 40626 173 76990 17858 43446 25050 10791
68990 52403 21503 52331 51909 73488]))
"Elapsed time: 9.182584 msecs"
60
#'user/next-transformation
user> (time (solution [91293 38437 40626 173 76990 17858 43446 25050 10791
68990 52403 21503 52331 51909 73488 91293 38437 40626
173 76990 17858 43446 25050 10791 68990 52403 21503
52331 51909 73488 91293 38437 40626 173 76990 17858
43446 25050 10791 68990 52403 21503 52331 51909 73488
91293 38437 40626 173 76990 17858 43446 25050 10791
68990 52403 21503 52331 51909 73488]))
"Elapsed time: 2.379 msecs"
60
Cleaning up solution
, per my tastes:
;; :sum is never used. So there is no reason
;; to have `state` with two entries.
;; (Not that there ever was:
;; `(loop [arr arr sum 0]...)`)
;; would have been fine.
;; With only `arr` changing, we can
;; recur to the function top, without
;; `loop`
(defn s-2 [arr]
(let [trx (next-transformation arr)]
(cond (nil? trx) (* (count arr)
(nth arr 0 0)) ; Last `0` provides a default
(= 1 (first arr)) (count arr)
:else (let [[i j] trx ;destructuring
diff (- (arr i) (arr j))]
(recur (assoc arr i diff))))))
Timing:
user> (def solution s-2)
#'user/solution
user> (time (solution [91293 38437 40626 173 76990 17858 43446 25050 10791
68990 52403 21503 52331 51909 73488 91293 38437 40626
173 76990 17858 43446 25050 10791 68990 52403 21503
52331 51909 73488 91293 38437 40626 173 76990 17858
43446 25050 10791 68990 52403 21503 52331 51909 73488
91293 38437 40626 173 76990 17858 43446 25050 10791
68990 52403 21503 52331 51909 73488]))
"Elapsed time: 2.116625 msecs"
60
Using time
once is not great benchmarking, but given the size of the differences seen I believe it is still directional in this case.