Search code examples
clojurebenchmarking

Protocol functions return a result faster if the constant it returns isn't cached


I'm writing an Entity Component System. Part of it is a protocol that defines how Systems are used. Part of the protocol is a function that returns the components that the system requires to function. It can be distilled down to the following:

(defprotocol System
  (required-components [sys]
    "Returns a map of keys and initial values of components that the System requires to operate.")

Because the value that this function returns is really a constant, I thought that caching it would probably be a good idea because it might be required +60 times a second. To tell whether or not it makes a difference though, I wrote up the following test:

(defrecord Input-System1 []
  System
  (required-components [sys] {:position [0 0] :angle 0}))

(def req-comps
  {:position [0 0] :angle 0})

(defrecord Input-System2 []
  System
  (required-components [sys] req-comps))

Then in a REPL, I ran the following time tests:

(let [sys (->Input-System1)]
  (time-multiple 1000000000
    (required-components sys)))

(let [sys (->Input-System2)]
  (time-multiple 1000000000
    (required-components sys)))

(Code for time-multiple below).

The odd thing is, Input-System1 consistently finishes faster than Input-System2: 2789.973066ms vs 3800.345803ms on the last run.

I find this odd though given how, in theory, version 1 is constantly recreating the component map, while version 2 only references a pre-defined value.

I tried recreating this by cutting protocols out:

(defn test-fn1 []
  req-comps)

(defn test-fn2 []
  {:position [0 0] :angle 0})

(time-multiple 1000000000
  (test-fn1))

(time-multiple 1000000000
  (test-fn2))

But this time, the results end up being almost identical: 3789.478675ms vs 3767.577814ms.

This leads me to believe it has something to do with protocols, but I can't tell what. What's going on here? I know that given the number of tests, 1000ms is fairly insignificant, so I'm not trying to micro-optimize here. I'm just curious.

(defmacro time-pure
  "Evaluates expr and returns the time it took.
  Modified the native time macro to return the time taken."
  [expr]
  `(let [start# (current-nano-timestamp)
         ret# ~expr]
     (/ (double (- (current-nano-timestamp) start#)) 1000000.0)))


(defmacro time-multiple
  "Times the expression multiple times, returning the total time taken, in ms"
  [times expr]
  `(time-pure
     (dotimes [n# ~times]
      ~expr)))

Solution

  • In either case your map is a constant, created during class loading (it's statically known, so no new object creation during the method call.) On the other hand, your cached case costs you an extra indirection - accessing a var.

    To demonstrate:

    (def req-comps
       {:position [0 0] :angle 0})
    
    (defn asystem-1 []
       {:position [0 0] :angle 0})
    
    (defn asystem-2 []
       req-comps)
    

    (It doesn't matter whether we deal with protocols or not - the functions are compiled the same, it's just easier to find them in the compiled code this way.)

    public final class core$asystem_1 extends AFunction {
        public static final AFn const__4 = (AFn)RT.map(new Object[]{RT.keyword((String)null, "position"), Tuple.create(Long.valueOf(0L), Long.valueOf(0L)), RT.keyword((String)null, "angle"), Long.valueOf(0L)});
    
        public core$asystem_1() {
        }
    
        public static Object invokeStatic() {
            return const__4;
        }
    
        public Object invoke() {
            return invokeStatic();
        }
    }
    

    See - it just returns the precalculated constant.

    public final class core$asystem_2 extends AFunction {
        public static final Var const__0 = (Var)RT.var("so.core", "req-comps");
    
        public core$asystem_2() {
        }
    
        public static Object invokeStatic() {
            return const__0.getRawRoot();
        }
    
        public Object invoke() {
            return invokeStatic();
        }
    }
    

    An extra call to getRawRoot().