Search code examples
performancedarthashmaphashsetprimitive

Primitive type HashSet or HashMap 10 times faster than Object?


Update: This should be considered a bug in flutter channel stable v1.9.1+hotfix.4. When switched to channel master this is fixed, int and Object Sets have performances of the same order of magnitude.

A benchmark throws that HashSet<int> performs an average of 10 times faster than HashSet<Object>. What is the reason of this behavior?

I was having a big performance drop (over an order of magnitude) when I changed the internals of a library in a way that I expected to be more efficient.

The problem was changing from a HashSet<int> to a HashSet<MyClass>, which I confirmed when running the simple benchmark in a HashSet that adds the same value 1000 times. The result is that HashSet<int> performs an average of 10 times faster than HashSet<Object>.

Any suggestions from a code design perspective? The code looks much cleaner when having a HashSet<MyClass> instead of having additional data structures that relate the instances to an int. This performance drop is important since it's the core of the library. The only way I have found to keep the program efficient, is to manipulate everything with an integer that identifies MyClass.

The benchmark can be found here: https://gist.github.com/icatalud/dc28bd3bdd7c13b39c308b7abb9a9d8c

The function that adds to the Set is the following:

plainAddSet<T>(T obj, [int n = 1000]) {
  var s = Set<T>();
  for (var i = 0; i < n; i++) {
    s.add(obj);
  }
}

Update: Running more tests led me to the root cause of the performance drop. The hashCode getter is extremely slow. In the following class if the hashCode method is not overriden, it takes 20 times longer, even slower than Object. Otherwise it takes slightly more than an int Set.

class Identifiable {
  static int lastId = 0;
  int id;
  Identifiable() : id = lastId++;

  // Not overriding this getter, causes the benchmark to take 20 times longer.
  int get hashCode => id;

  bool operator ==(other) =>
      other.runtimeType == runtimeType &&
      hashCode == other.hashCode;
}

Update 2: Still, it would be interesting to understand why overriding the == operator and accessing the hashCode from there is three times slower than leaving the Object default implementation (which is more than 10 times slower than int).

class Identifiable {
  bool operator ==(other) =>
      other.runtimeType == runtimeType &&
      other.hashCode == hashCode;
}

Update 3: There is an open issue about this in the dart SDK repository. It has been there for over 4 years.


Solution

  • I have tested with different class implementations and can see the waiting time are for hashCode. This makes sense since the hashCode for an integer are just the same integer as it represent (makes sense if you think about it).

    The hashCode for Object() are more complicated and needs to make an external call into some code I don't know.

    If I create my own class and override the hashCode property with something like "always return 5" then I get the same speed as the integer.

    Since hashCode (and ==) are being used every time you add something to a Map or Set, it is important to keep this two functions as efficient as possible.

    Answer to Update 2

    The reason why the Object() hashCode property are slower is because it does a lot than the integer hashCode. I have checked the source code and this is the steps it does:

    As you can see this is a lot more complicated than just taking the value of the integer itself and use that as the hash. As you also can see in the code there are even usage of Random generators.

    I should add, that the Hash are cached so it is only the first time calling hashCode it is really slow. But as you can see the cached version are still an external call which are more expensive than just getting an integer variable.

    Update for "Proposing a workaround to this issue"

    This is happening in flutter channel stable v1.9.1+hotfix.4, but is fixed in master. It shouldn't be necessary to implement in the next Flutter releases.

    If you can't accept the performance of the standard hashCode method right now, I think the only other way is to manually cache it like:

    class ObjectWithCachedHash {
      int _hashCodeCache;
    
      @override
      int get hashCode => _hashCodeCache ??= super.hashCode;
    }
    

    This should only be done if the hashCode are not dependent on any values inside the object (e.g. the standard hashCode implementation which are purely instance based).

    Updated with link to issue

    The problem with slowish hashCode is a know issue (thanks Ignacio Catalan for pointing that out):

    https://github.com/dart-lang/sdk/issues/24206

    It should be noted that the performance problem found in this stackoverflow question are more likely related to a recent regression there has been fixed on master:

    https://github.com/dart-lang/sdk/issues/24206#issuecomment-539448012

    But the fix for this regression will only fix the issue for x64, so 32 bit architectures (and properly also all ARM architectures) will still be affected by the performance issue (which are another issue which are not related to the regression).

    So the recommendation for 32bit and ARM users would be to locally cache the hashCode value if you code depends on the Object.hashCode and are going to make a lot comparisons (e.g. using Map's or Set's with larger amount of objects).