Search code examples
redishashmapload-factor

Why load factor of hashmap in redis is as large as 5


In algorithm classes and authorized books, load-factor is smaller than 1 as it is with Java the default is 0.75. But in redis source code, the load factor is 5.

54 /* Using dictEnableResize() / dictDisableResize() we make possible to
55  * enable/disable resizing of the hash table as needed. This is very important
56  * for Redis, as we use copy-on-write and don't want to move too much memory
57  * around when there is a child performing saving operations.
58  *
59  * Note that even when dict_can_resize is set to 0, not all resizes are
60  * prevented: a hash table is still allowed to grow if the ratio between
61  * the number of elements and the buckets > dict_force_resize_ratio. */
62 static int dict_can_resize = 1;
63 static unsigned int dict_force_resize_ratio = 5;

Why is it?


Solution

  • The load factor to start rehashing is ~1. The dict_force_resize_ratio value is a safety measure such that even if rehashing is disabled, once it gets to that load factor it will force it.

    You can see this in _dictExpandIfNeeded(dict *d) in dict.c

    /* If we reached the 1:1 ratio, and we are allowed to resize the hash
     * table (global setting) or we should avoid it but the ratio between
     * elements/buckets is over the "safe" threshold, we resize doubling
     * the number of buckets. */
    if (d->ht[0].used >= d->ht[0].size &&
        (dict_can_resize ||
         d->ht[0].used/d->ht[0].size > dict_force_resize_ratio))
    {
        return dictExpand(d, d->ht[0].used*2);
    }
    

    Redis allows ~1 to start rehashing since the rehashing is not done all at once. It is progressively done by maintaining two hash tables.

    See dict.h:

    /* This is our hash table structure. Every dictionary has two of this as we
     * implement incremental rehashing, for the old to the new table. */
    typedef struct dictht {
        dictEntry **table;
        unsigned long size;
        unsigned long sizemask;
        unsigned long used;
    } dictht;
    
    typedef struct dict {
        dictType *type;
        void *privdata;
        dictht ht[2];
        long rehashidx; /* rehashing not in progress if rehashidx == -1 */
        unsigned long iterators; /* number of iterators currently running */
    } dict;
    

    And in dict.c:

    /* Performs N steps of incremental rehashing. Returns 1 if there are still
     * keys to move from the old to the new hash table, otherwise 0 is returned.
     *
     * Note that a rehashing step consists in moving a bucket (that may have more
     * than one key as we use chaining) from the old to the new hash table, however
     * since part of the hash table may be composed of empty spaces, it is not
     * guaranteed that this function will rehash even a single bucket, since it
     * will visit at max N*10 empty buckets in total, otherwise the amount of
     * work it does would be unbound and the function may block for a long time. */
    int dictRehash(dict *d, int n) {...
    

    And there is some additional insight in redis.conf, for the activerehashing setting.

    # Active rehashing uses 1 millisecond every 100 milliseconds of CPU time in
    # order to help rehashing the main Redis hash table (the one mapping top-level
    # keys to values). The hash table implementation Redis uses (see dict.c)
    # performs a lazy rehashing: the more operation you run into a hash table
    # that is rehashing, the more rehashing "steps" are performed, so if the
    # server is idle the rehashing is never complete and some more memory is used
    # by the hash table.
    #
    # The default is to use this millisecond 10 times every second in order to
    # actively rehash the main dictionaries, freeing memory when possible.
    #
    # If unsure:
    # use "activerehashing no" if you have hard latency requirements and it is
    # not a good thing in your environment that Redis can reply from time to time
    # to queries with 2 milliseconds delay.
    #
    # use "activerehashing yes" if you don't have such hard requirements but
    # want to free memory asap when possible.
    activerehashing yes