Search code examples
pythonmemorygoredisredigo

Inadequate RAM usage by Redis


I'm developing an API using Go and Redis. The problem is that RAM usage is inadequate and I can't find the root of the problem.

TL;DR version

There are hundreds/thousands of hash objects. Each one of 1 KB objects (key+value) takes ~0.5 MB of RAM. However, there is no memory fragmentation (INFO shows none).

Also, dump.rdb is 70x times smaller than the RAM set (360KB dump.rdb vs 25MB RAM for 50 objects, and 35.5MB vs 2.47GB for 5000 objects).

Long version

Redis instance is filled mostly with task:123 hashes of the following kind:

    "task_id"       : int
    "client_id"     : int
    "worker_id"     : int
    "text"          : string (0..255 chars)
    "is_processed"  : boolean
    "timestamp"     : int
    "image"         : byte array (1 kbyte) 

Also, there are a couple of integer counters, one list and one sorted set (both consist of task_id's).

RAM usage has a linear dependency on the number of task objects.

INFO output for 50 tasks:

# Memory
used_memory:27405872
used_memory_human:26.14M
used_memory_rss:45215744
used_memory_peak:31541400
used_memory_peak_human:30.08M
used_memory_lua:35840
mem_fragmentation_ratio:1.65
mem_allocator:jemalloc-3.6.0

and 5000 tasks:

# Memory
used_memory:2647515776
used_memory_human:2.47G
used_memory_rss:3379187712
used_memory_peak:2651672840
used_memory_peak_human:2.47G
used_memory_lua:35840
mem_fragmentation_ratio:1.28
mem_allocator:jemalloc-3.6.0

Size of dump.rdb for 50 tasks is 360kB and for 5000 tasks it's 35553kB.

Every task object has serializedlength of ~7KB:

127.0.0.1:6379> DEBUG OBJECT task:2000
Value at:0x7fcb403f5880 refcount:1 encoding:hashtable serializedlength:7096 lru:6497592 lru_seconds_idle:180

I've written a Python script trying to reproduce the problem:

import redis
import time
import os 
from random import randint

img_size = 1024 * 1 # 1 kb
r = redis.StrictRedis(host='localhost', port=6379, db=0)

for i in range(0, 5000):
    values = { 
        "task_id"   : randint(0, 65536),
        "client_id" : randint(0, 65536),
        "worker_id" : randint(0, 65536),
        "text"      : "",
        "is_processed" : False,
        "timestamp" : int(time.time()),
        "image"     : bytearray(os.urandom(img_size)),
    }
    key = "task:" + str(i)
    r.hmset(key, values)
    if i % 500 == 0: print(i)

And it consumes just 80MB of RAM!

I would appreciate any ideas on how to figure out what's going on.


Solution

  • You have lots and lots of small HASH objects, and that's fine. But each of them has a lot of overhead in the redis memory, since it has a separate dictionary. There is a small optimization for this that usually improves things significantly, and it's to keep hashes in a memory optimized but slightly slower data structure, which at these object sizes should not matter much. From the config:

    # Hashes are encoded using a memory efficient data structure when they have a
    # small number of entries, and the biggest entry does not exceed a given
    # threshold. These thresholds can be configured using the following directives. 
    hash-max-ziplist-entries 512
    hash-max-ziplist-value 64
    

    Now, you have large values which causes this optimization not to work. I'd set hash-max-ziplist-value to a few kbs (depending on the size of your largest object), and it should improve this (you should not see any performance degradation in this HASH size).

    Also, keep in mind that redis compresses its RDB files relative to what you have in memory, so a ~50% reduction over memory is to be expected anyway.

    [EDIT] After re-reading your question and seeing it's a go only problem, and considering the fact that the compressed rdb is small, something tells me you're writing a bigger size than you'd expect for the image. Any chance you're writing that off a []byte slice? If so, perhaps you did not trim it and you're writing a much bigger buffer or something similar? I've worked like this with redigo tons of times and never seen what you're describing.