We are running redis via elasticache on AWS and are seeing memory usage spike when running a large number of lambda functions which just read. Here is some example output from redis-cli --stat
------- data ------ --------------------- load -------------------- - child -
keys mem clients blocked requests connections
1002 28.11M 15 0 2751795 (+11) 53877
1002 28.07M 15 0 2751797 (+2) 53877
1002 28.07M 15 0 2751799 (+2) 53877
1002 28.11M 15 0 2751803 (+4) 53877
1002 28.07M 15 0 2751806 (+3) 53877
1001 28.11M 15 0 2751808 (+2) 53877
1007 28.08M 15 0 2751837 (+29) 53877
1007 28.08M 15 0 2751839 (+2) 53877
1005 28.10M 16 0 2751841 (+2) 53878
1007 171.68M 94 0 2752012 (+171) 53957
1006 545.93M 316 0 2752683 (+671) 54179
1006 1.07G 483 0 2753508 (+825) 54346
1006 1.54G 677 0 2754251 (+743) 54540
1006 1.98G 882 0 2755024 (+773) 54745
1006 2.35G 1010 0 2755776 (+752) 54873
1005 2.78G 1014 0 2756548 (+772) 54877
1005 2.80G 1014 0 2756649 (+101) 54877
1004 2.79G 1014 0 2756652 (+3) 54877
1008 2.79G 1014 0 2756682 (+30) 54877
1007 2.79G 1014 0 2756685 (+3) 54877
As you can see the number of keys is pretty much constant but as the number of clients increases the memory usage ramps up to 2.8GB. Is this memory pattern expected and if so is there a way to mitigate it other than increasing the amount of RAM available to the process?
The lambda clients are written in Java using lettuce 5.2.1.RELEASE and spring-data-redis 2.2.1.RELEASE
Unless there is some additional redis interaction within spring-data-redis the client code is basically as follows
public <T> T get(final String label, final RedisTemplate<String, ?> redisTemplate) {
final BoundHashOperations<String, String, T> cache = redisTemplate.boundHashOps(REDIS_KEY);
return cache.get(label);
}
There are no usages of RedisTemplate#keys
in my codebase, the only interaction with redis is via RedisTemplate#boundHashOps
Here is the output from redis-cli info memory
before and after the spike:
# Memory
used_memory:31558400
used_memory_human:30.10M
used_memory_rss:50384896
used_memory_rss_human:48.05M
used_memory_peak:6498905008
used_memory_peak_human:6.05G
used_memory_peak_perc:0.49%
used_memory_overhead:4593040
used_memory_startup:4203584
used_memory_dataset:26965360
used_memory_dataset_perc:98.58%
allocator_allocated:32930040
allocator_active:34332672
allocator_resident:50593792
used_memory_lua:37888
used_memory_lua_human:37.00K
used_memory_scripts:0
used_memory_scripts_human:0B
number_of_cached_scripts:0
maxmemory:5140907060
maxmemory_human:4.79G
maxmemory_policy:volatile-lru
allocator_frag_ratio:1.04
allocator_frag_bytes:1402632
allocator_rss_ratio:1.47
allocator_rss_bytes:16261120
rss_overhead_ratio:1.00
rss_overhead_bytes:-208896
mem_fragmentation_ratio:1.60
mem_fragmentation_bytes:18826560
mem_not_counted_for_evict:0
mem_replication_backlog:0
mem_clients_slaves:0
mem_clients_normal:269952
mem_aof_buffer:0
mem_allocator:jemalloc-5.1.0
active_defrag_running:0
lazyfree_pending_objects:0
# Memory
used_memory:4939687896
used_memory_human:4.60G
used_memory_rss:4754452480
used_memory_rss_human:4.43G
used_memory_peak:6498905008
used_memory_peak_human:6.05G
used_memory_peak_perc:76.01%
used_memory_overhead:4908463998
used_memory_startup:4203584
used_memory_dataset:31223898
used_memory_dataset_perc:0.63%
allocator_allocated:5017947040
allocator_active:5043314688
allocator_resident:5161398272
used_memory_lua:37888
used_memory_lua_human:37.00K
used_memory_scripts:0
used_memory_scripts_human:0B
number_of_cached_scripts:0
maxmemory:5140907060
maxmemory_human:4.79G
maxmemory_policy:volatile-lru
allocator_frag_ratio:1.01
allocator_frag_bytes:25367648
allocator_rss_ratio:1.02
allocator_rss_bytes:118083584
rss_overhead_ratio:0.92
rss_overhead_bytes:-406945792
mem_fragmentation_ratio:0.96
mem_fragmentation_bytes:-185235352
mem_not_counted_for_evict:0
mem_replication_backlog:0
mem_clients_slaves:0
mem_clients_normal:4904133550
mem_aof_buffer:0
mem_allocator:jemalloc-5.1.0
active_defrag_running:0
lazyfree_pending_objects:0
Having discussed this with AWS support the cause of this memory spike is that each of the 1000 lambda clients is filling up a read buffer with ~5mb of data as the data we are storing in redis is large serialized json objects.
Their recommendations are to either:
Add 2-3 replicas in the cluster and use replica nodes for read requests. You can use reader endpoints for load balancing the requests
Or control client output buffers with the parameters but note the clients will be disconnect if they reach the buffer limit.
Given these constraints and our usage profile we're actually going to switch to use S3 in this instance.