aerospike bad latencies with aws

We have aerospike running in the Soft layer in bare metal machines in 2 node cluster. our profile average size is 1.5 KB and at peak, operations will be around 6000 ops/sec in each node. The latencies are all fine which is at peak > 1ms will be around 5%.

Now we planned to migrate to aws. So we booted 2 i3.xlarge machines. We ran the benchmark with the 1.5KB object size with the 3x load. results were satisfactory, that is around 4-5%(>1ms). Now we started actual processing, the latencies at peak jumped to 25-30% that is > 1ms and maximum it can accommodate is some 5K ops/sec. So we added one more node, we did benchmark (4.5KB object size and 3x load). The results were 2-4%(>1ms). Now after adding to cluster, the peak came down to 16-22%. We added one more node and peak is now at 10-15%.

The version in aws is aerospike-server-community-3.15.0.2 the version in Sl is Aerospike Enterprise Edition 3.6.3

Our config as follows

#Aerospike database configuration file.
service {
    user xxxxx 
    group xxxxx 
    run-as-daemon
    paxos-single-replica-limit 1 # Number of nodes where the replica count is automatically reduced to 1.
    pidfile /var/run/aerospike/asd.pid 
    service-threads 8 
    transaction-queues 8
    transaction-threads-per-queue 8
    proto-fd-max 15000
}

logging {

    #Log file must be an absolute path.
    file /var/log/aerospike/aerospike.log {
        context any info
    }
}

network {
    service {
        port 13000
        address h1 reuse-address
    }

    heartbeat {
        mode mesh
        port 13001
        address h1

        mesh-seed-address-port h1 13001
        mesh-seed-address-port h2 13001
        mesh-seed-address-port h3 13001
        mesh-seed-address-port h4 13001
        interval 150
        timeout 10
    }

    fabric {
        port 13002
        address h1
    }

    info {
        port 13003
        address h1
    }
}

namespace XXXX { 
    replication-factor 2
    memory-size 27G
    default-ttl 10d
    high-water-memory-pct 70
    high-water-disk-pct 60
    stop-writes-pct 90
    storage-engine device {
        device /dev/nvme0n1
        scheduler-mode noop
        write-block-size 128K
    }
}

What should be done to bring down latencies in aws?

Solution

This comes down to the difference in the performance characteristics of the SSDs of the i3 nodes, compared to what you had on Softlayer. If you ran Aerospike on a floppy disk you'd get 0.5TPS.

Piyush's comment mentions ACT, the open source tool Aerospike has created to benchmark SSDs with real database workloads. The point of ACT is to find the sustained rate in which the SSD can be relied on to deliver the latency you want. Burst rates don't matter much for databases.

The performance engineering team at Aerospike has used ACT to find what the i3 1900G SSD can do, and published the results in a post. Its ACT rating is 4x, meaning that the full 1900G SSD can do 8Ktps reads, 4Ktps writes with the standard 1.5K object size, 128K block size, and stay at 95% < 1ms, 99% < 8ms, 99.9% < 64ms. This is not particularly good for an SSD. By comparison, a Micron 9200 PRO rates at 94.5x, nearly 24 times higher TPS load. What more, with the i3.xlarge you're sharing half that drive with a neighbor. There's no way to cap the IOPS so that you each get half, there's only a partition of the storage. This means that you can expect latency spikes originating in the neighbor. The i3.2xlarge is the smallest instance that gives you the entire SSD.

So, you take the ACT information and you use it to do capacity planning. The main factors you need to know are the average object size (you can find that using objsz histogram), number of objects (again, available via asadm), peak read TPS and peak write TPS (how does the 60Ktps you mentioned split between reads and writes?).

Check your logs for your cache-read-pct values. If they're in the range of 10% or higher you should be raising your post-write-queue value to get better read latencies (and also reduce IOPS pressure from the drive).