Search code examples
clojurejvmprofilinghprof

Please help me understand these Clojure Hprof traces


I have some Clojure code that is simulating and then processing numerical data. The data are basically vectors of double values; the processing mainly involves summing their values in various ways. I will include some code below, but my question is (I think) more general - I just don't have a clue how to interpret the hprof results.

Anyway, my test code is:

(defn spin [n]
  (let [c 6000
        signals (spin-signals c)]
      (doseq [_ (range n)] (time (spin-voxels c signals)))))

(defn -main []
  (spin 4))

where spin-voxels should be more expensive than spin-signals (especially when repeated multiple times). I can give the lower level routines, but I think this question is more about me not understanding basics of the traces (below).

When I compile this with lein and then do some simple profiling:

> java -cp classes:lib/clojure-1.3.0-beta1.jar -agentlib:hprof=cpu=samples,depth=10,file=hprof.vec com.isti.compset.stack
"Elapsed time: 14118.772924 msecs"
"Elapsed time: 10082.015672 msecs"
"Elapsed time: 9212.522973 msecs"
"Elapsed time: 12968.23877 msecs"
Dumping CPU usage by sampling running threads ... done.

and the profile trace looks like:

CPU SAMPLES BEGIN (total = 4300) Sun Aug 28 15:51:40 2011
rank   self  accum   count trace method
   1  5.33%  5.33%     229 300791 clojure.core$seq.invoke
   2  5.21% 10.53%     224 300786 clojure.core$seq.invoke
   3  5.05% 15.58%     217 300750 clojure.core$seq.invoke
   4  4.93% 20.51%     212 300787 clojure.lang.Numbers.add
   5  4.74% 25.26%     204 300799 clojure.core$seq.invoke
   6  2.60% 27.86%     112 300783 clojure.lang.RT.more
   7  2.51% 30.37%     108 300803 clojure.lang.Numbers.multiply
   8  2.42% 32.79%     104 300788 clojure.lang.RT.first
   9  2.37% 35.16%     102 300831 clojure.lang.RT.more
  10  2.37% 37.53%     102 300840 clojure.lang.Numbers.add

which is pretty cool. Up to here, I am happy. I can see that I am wasting time with generic handling of numerical values.

So I look at my code and decide that, as a first step, I will replace vec with d-vec:

(defn d-vec [collection]
  (apply conj (vector-of :double) collection))

I am not sure that will be sufficient - I suspect I will also need to add some type annotations in various places - but it seems like a good start. So I compile and profile again:

> java -cp classes:lib/clojure-1.3.0-beta1.jar -agentlib:hprof=cpu=samples,depth=10,file=hprof.d-vec com.isti.compset.stack
"Elapsed time: 15944.278043 msecs"
"Elapsed time: 15608.099677 msecs"
"Elapsed time: 16561.659408 msecs"
"Elapsed time: 15416.414548 msecs"
Dumping CPU usage by sampling running threads ... done.

Ewww. So it is significantly slower. And the profile?

CPU SAMPLES BEGIN (total = 6425) Sun Aug 28 15:55:12 2011
rank   self  accum   count trace method
   1 26.16% 26.16%    1681 300615 clojure.core.Vec.count
   2 23.28% 49.45%    1496 300607 clojure.core.Vec.count
   3  7.74% 57.18%     497 300608 clojure.lang.RT.seqFrom
   4  5.59% 62.77%     359 300662 clojure.core.Vec.count
   5  3.72% 66.49%     239 300604 clojure.lang.RT.first
   6  3.25% 69.74%     209 300639 clojure.core.Vec.count
   7  1.91% 71.66%     123 300635 clojure.core.Vec.count
   8  1.03% 72.68%      66 300663 clojure.core.Vec.count
   9  1.00% 73.68%      64 300644 clojure.lang.RT.more
  10  0.79% 74.47%      51 300666 clojure.lang.RT.first
  11  0.75% 75.22%      48 300352 clojure.lang.Numbers.double_array
  12  0.75% 75.97%      48 300638 clojure.lang.RT.more
  13  0.64% 76.61%      41 300621 clojure.core.Vec.count
  14  0.62% 77.23%      40 300631 clojure.core.Vec.cons
  15  0.61% 77.84%      39 300025 java.lang.ClassLoader.defineClass1
  16  0.59% 78.43%      38 300670 clojure.core.Vec.cons
  17  0.58% 79.00%      37 300681 clojure.core.Vec.cons
  18  0.54% 79.55%      35 300633 clojure.lang.Numbers.multiply
  19  0.48% 80.03%      31 300671 clojure.lang.RT.seqFrom
  20  0.47% 80.50%      30 300609 clojure.lang.Numbers.add

I have included more rows here because this is the part I don't understand.

Why on earth is Vec.count appearing so often? It's a method that returns the size of the vector. A single line lookup of an attribute.

I assume I am slower because I am still jumping back and forth between Double and double, and that things may improve again when I add more type annotations. But I don't understand what I have now, so I am not so sure blundering fowards makes much sense.

Please, can anyone explain the dump above in general terms? I promise I don't repeatedly call count - instead I have lots of maps and reduces and a few explicit loops.

I wondered if I am perhaps being confused by the JIT? Maybe I am missing a pile of information because functions are being inlined? Oh, and I am using 1.3.0-beta1 because it appears to have more sensible number handling.

[UPDATE] I summarised my experiences at http://www.acooke.org/cute/Optimising1.html - got a 5x speedup (actually was 10x after cleaning some more and moving to 1.3) despite never understanding this.


Solution

  • calling seq for a Vec object (object created by vector-of) creates a VecSeq Object.

    VecSeq object created on Veca calls Vec.count in it's method internal-reduce, which is used by clojure.core/reduce.

    so it seems that a vector created by vector-of calls Vec.count while reducing. And as you mentioned that the code did a lot of reducing this seems to be the cause

    What remains spooky is that Vec.count is that Vec.count seems to be very simple:

    clojure.lang.Counted
      (count [_] cnt)
    

    a simple getter that doesn't do any counting.