Although there are multiple threads regarding the OOM issue would like to clarify certain things. We are running a 36 node Cassandra cluster of 3.11.6 version in K8's with 32gigs allocated for the container.
The container is getting OOM killed (Note:- Not java heap OOM error rather linux cgroup OOM killer) since it's reaching the memory limit of 32 gigs for its cgroup.
Stats and configs
map[limits:map[ephemeral-storage:2Gi memory:32Gi] requests:map[cpu:7 ephemeral-storage:2Gi memory:32Gi]]
Cgroup Memory limit
34359738368 -> 32 Gigs
The JVM spaces auto calculated by Cassandra -Xms19660M -Xmx19660M -Xmn4096M
Cassandra Yaml --> https://pastebin.com/ZZLTc1cM
JVM Options --> https://pastebin.com/tjzZRZvU
Nodetool info output on a node which is already consuming 98% of the memory
nodetool info
ID : 59c53bdb-4f61-42f5-a42c-936ea232e12d
Gossip active : true
Thrift active : true
Native Transport active: true
Load : 179.71 GiB
Generation No : 1643635507
Uptime (seconds) : 9134829
Heap Memory (MB) : 5984.30 / 19250.44
Off Heap Memory (MB) : 1653.33
Data Center : datacenter1
Rack : rack1
Exceptions : 5
Key Cache : entries 138180, size 99.99 MiB, capacity 100 MiB, 9666222 hits, 10281941 requests, 0.940 recent hit rate, 14400 save period in seconds
Row Cache : entries 10561, size 101.76 MiB, capacity 1000 MiB, 12752 hits, 88528 requests, 0.144 recent hit rate, 900 save period in seconds
Counter Cache : entries 714, size 80.95 KiB, capacity 50 MiB, 21662 hits, 21688 requests, 0.999 recent hit rate, 7200 save period in seconds
Chunk Cache : entries 15498, size 968.62 MiB, capacity 1.97 GiB, 283904392 misses, 34456091078 requests, 0.992 recent hit rate, 467.960 microseconds miss latency
Percent Repaired : 8.28107989669628E-8%
Token : (invoke with -T/--tokens to see all 256 tokens)
What had been done
Questions
Although we know cassandra does occupy off heap spaces (Bloom filter, Memtables , etc )
Our production cluster requires tons of approval so deploying them with jconsole remote is tough. Any other ways to account for this memory usage.
There's a good chance that the SSTables are getting mapped to memory (cached with mmap()
). If this is the case, it wouldn't be immediate and memory usage would grow over time depending on when SSTables are read which are then cached. I've written about this issue in https://dba.stackexchange.com/q/340515/255786.
There's an issue with a not-so-well-known configuration property called "disk access mode". When it's not set it cassandra.yaml
, it defaults to mmap
which means that all SSTables get mmap
ed to memory. If so, you'll see an entry in the system.log
on startup that looks like:
INFO [main] 2019-05-02 12:33:21,572 DatabaseDescriptor.java:350 - \
DiskAccessMode 'auto' determined to be mmap, indexAccessMode is mmap
The solution is to configure disk access mode to only cache SSTable index files (not the *-Data.db
component) by setting:
disk_access_mode: mmap_index_only
For more information, see the link I posted above. Cheers!
UPDATE 2024 - Previously linked to DataStax Community article #6947 which has been retired in preference for the Stack Exchange Network.