Search code examples
clojure

Clojure Set vs Map Lookup Performance Differences


I have a list of uids and want to check if a uid is a member of this list

The natural way to implement it would be to create a set (clojure.set) of uids and search for that member on that list

What I found out is that map key lookup is a lot faster - I used the following snippet to benchmark both methods:

(def uids #{:a :b :c :d :e :f :g :h :i :j :k :l :m :n :o :p :a1 :b1 :c1 :d1 :e1 :f1 :h1 :i1 :j1 :k1 :l1 :m1 :n1 :o1 :p1})
(def uids-map (reduce (fn [acc v] (assoc acc v true)) {} uids))
(time (dotimes [i 1000000] (:o1 uids)))
;user=> "Elapsed time: 191.076266 msecs"
(time (dotimes [i 1000000] (:o1 uids-map)))
;user=> "Elapsed time: 38.159388 msecs"

the results were very consistent across invocations - map lookup took about 1/5 of set lookup

So is set not optimal for key lookup or am I using it the wrong way?

Also, what are the reasons for the differences in these benchmarks?

I was under the impression that a set is implemented in clojure as an associative data structure similar to vectors - so why is key lookup significantly slower than a simple map?


Solution

  • I've never went into clojure's source but from what I see the set implementation actually uses a map inside:

    protected APersistentSet(IPersistentMap impl){
        this.impl = impl;
    }
    

    It also delegates the invoke call to the internal map.

    In APersistentSet:

    public Object invoke(Object arg1) {
        return get(arg1);
    }
    
    // ....
    
    public Object get(Object key){
        return impl.valAt(key);
    }
    

    In APersistentMap:

    public Object invoke(Object arg1) {
        return valAt(arg1);
    }
    
    public Object invoke(Object arg1, Object notFound) {
        return valAt(arg1, notFound);
    }
    

    So this can't explain the difference.

    As mentioned in the comments by @cgrand, when we reverse the arguments its faster (and about the same, since we call set's invoke immediately). So I looked up Keyword's invoke which is what is probably used for (:k obj):

    final public Object invoke(Object obj, Object notFound) {
        if(obj instanceof ILookup)
            return ((ILookup)obj).valAt(this,notFound);
        return RT.get(obj, this, notFound);
    }
    

    The important thing to notice is that ILookup is implemented in APersistentMap (through Associative) but not in APersistentSet. You can also verify in clojure:

    (instance? clojure.lang.ILookup #{}) ;; false
    (instance? clojure.lang.ILookup {})  ;; true
    

    So maps go through the "happy path" and sets end up in RT.get which I believe is the runtime.

    Lets have a look at the runtime.

    It Initially attempts to do practically the same thing as keyword:

    static public Object get(Object coll, Object key){
        if(coll instanceof ILookup)
            return ((ILookup) coll).valAt(key);
        return getFrom(coll, key);
    }
    

    But since we know sets do not implement ILookup we know they go to RT.getFrom:

    static Object getFrom(Object coll, Object key){
        if(coll == null)
            return null;
        else if(coll instanceof Map) {
            Map m = (Map) coll;
            return m.get(key);
        }
        else if(coll instanceof IPersistentSet) {
            IPersistentSet set = (IPersistentSet) coll;
            return set.get(key);
        }
        else if(key instanceof Number && (coll instanceof String || coll.getClass().isArray())) {
            int n = ((Number) key).intValue();
            if(n >= 0 && n < count(coll))
                return nth(coll, n);
            return null;
        }
        else if(coll instanceof ITransientSet) {
            ITransientSet set = (ITransientSet) coll;
            return set.get(key);
        }
    
        return null;
    }
    

    Which leads me to believe the main difference is the extra delegations and instanceof calls due to sets not implementing ILookup.

    As bonus I've added a test on a set that implements ILookup and delegates valAt to the internal map immediately (using proxy) which closed the gap a bit:

    (def uids #{:a :b :c :d :e :f :g :h :i :j :k :l :m :n :o :p :a1 :b1 :c1 :d1 :e1 :f1 :h1 :i1 :j1 :k1 :l1 :m1 :n1 :o1 :p1})
    (def uids-map (into {} (for [k uids] [k k])))
    (def lookupable-set (proxy [clojure.lang.APersistentSet clojure.lang.ILookup] [uids-map]
                          (valAt [k] (get uids-map k))))
    
    ;; verify
    (instance? clojure.lang.APersistentSet lookupable-set) ;; true
    (instance? clojure.lang.ILookup lookupable-set) ;; true
    
    (time (dotimes [i 1000000] (:o1 uids))) ;; 134.703101 msecs
    (time (dotimes [i 1000000] (:o1 lookupable-set))) ;; 63.187353 msecs  <-- faster
    (time (dotimes [i 1000000] (:o1 uids-map))) ;; 35.802762 msecs <-- still fastest
    

    To conclude: Where performance matters - invoking the set (#{...} k) without going through keyword (k #{...}) is as fast as map.

    But I could be wrong :)