Search code examples
cassandrajvmdatabase-tuningcassandra-2.1

Optimal JVM settings for Cassandra


I have a 4 node cluster with 16 core CPU and 100 GB RAM on each box (2 nodes on each rack).

As of now, all are running with default JVM settings of Cassandra (v2.1.4). With this setting, each node uses 13GB RAM and 30% CPU. It is a write heavy cluster with occasional deletes or updates.

Do I need to tune the JVM settings of Cassandra to utilize more memory? What all things should I be looking at to make appropriate settings?


Solution

  • Do I need to tune the JVM settings of Cassandra to utilize more memory?

    The DataStax Tuning Java Resources doc actually has some pretty sound advice on this:

    Many users new to Cassandra are tempted to turn up Java heap size too high, which consumes the majority of the underlying system's RAM. In most cases, increasing the Java heap size is actually detrimental for these reasons:

    • In most cases, the capability of Java to gracefully handle garbage collection above 8GB quickly diminishes.
    • Modern operating systems maintain the OS page cache for frequently accessed data and are very good at keeping this data in memory, but can be prevented from doing its job by an elevated Java heap size.

    If you have more than 2GB of system memory, which is typical, keep the size of the Java heap relatively small to allow more memory for the page cache.

    As you have 100GB of RAM on your machines, (if you are indeed running under the "default JVM settings") your JVM max heap size should be capped at 8192M. And actually, I wouldn't deviate from that that unless you are experiencing issues with garbage collection.

    JVM resources for Cassandra can be set in the cassandra-env.sh file. If you are curious, look at the code for cassandra-env.sh and look for the calculate_heap_sizes() method. That should give you some insight as to how Cassandra computes your default JVM settings.

    What all things should I be looking at to make appropriate settings?

    If you are running OpsCenter (and you should be), add a graph for "Heap Used" and "Non Heap Used."

    OpsCenter graphing Heap Used and Non Heap Used together

    This will allow you to easily monitor JVM heap usage for your cluster. Another thing that helped me, was to write a bash script in which I basically hijacked the JVM calculations from cassandra-env.sh. That way I can run it on a new machine, and know right away what my MAX_HEAP_SIZE and HEAP_NEWSIZE are going to be:

    #!/bin/bash
    clear
    echo "This is how Cassandra will determine its default Heap and GC Generation sizes."
    
    system_memory_in_mb=`free -m | awk '/Mem:/ {print $2}'`
    half_system_memory_in_mb=`expr $system_memory_in_mb / 2`
    quarter_system_memory_in_mb=`expr $half_system_memory_in_mb / 2`
    
    echo "   memory = $system_memory_in_mb"
    echo "     half = $half_system_memory_in_mb"
    echo "  quarter = $quarter_system_memory_in_mb"
    
    echo "cpu cores = "`egrep -c 'processor([[:space:]]+):.*' /proc/cpuinfo`
    
    #cassandra-env logic duped here
    #this should help you to see how much memory is being allocated
    #to the JVM
        if [ "$half_system_memory_in_mb" -gt "1024" ]
        then
            half_system_memory_in_mb="1024"
        fi
        if [ "$quarter_system_memory_in_mb" -gt "8192" ]
        then
            quarter_system_memory_in_mb="8192"
        fi
        if [ "$half_system_memory_in_mb" -gt "$quarter_system_memory_in_mb" ]
        then
            max_heap_size_in_mb="$half_system_memory_in_mb"
        else
            max_heap_size_in_mb="$quarter_system_memory_in_mb"
        fi
        MAX_HEAP_SIZE="${max_heap_size_in_mb}M"
    
        # Young gen: min(max_sensible_per_modern_cpu_core * num_cores, 1/4 * heap size)
        max_sensible_yg_per_core_in_mb="100"
        max_sensible_yg_in_mb=`expr ($max_sensible_yg_per_core_in_mb * $system_cpu_cores)`
    
        desired_yg_in_mb=`expr $max_heap_size_in_mb / 4`
        if [ "$desired_yg_in_mb" -gt "$max_sensible_yg_in_mb" ]
        then
            HEAP_NEWSIZE="${max_sensible_yg_in_mb}M"
        else
            HEAP_NEWSIZE="${desired_yg_in_mb}M"
        fi
    
    echo "Max heap size = " $MAX_HEAP_SIZE
    echo " New gen size = " $HEAP_NEWSIZE
    

    Update 20160212:

    Also, be sure to check-out Amy Tobey's 2.1 Cassandra Tuning Guide. She has some great tips on how to get your cluster running optimally.