Search code examples
aerospikeaerospike-ce

Why are Reads/sec very less in aerospike then as compared to Write/sec?


I am using aerospike v4.8 and i am making read and write requests to aerospike where in my write request i am getting a throughput of 4000 writes/sec whereas in reads the throughput is only 10-15 reads/sec which is very low.

My query is:

let query = aerospikeClient.query(nameSpace, set)
        query.select('count', 'targetKey')
        query.predexp = [
            predexp.stringBin('campaignKey'),
            predexp.stringValue(Id1 + ':' + Id2 + ':' + Id3 + ':' + channel),
            predexp.stringEqual(),

            predexp.integerBin('epochDay'),
            predexp.integerValue(epochDay),
            predexp.integerGreaterEq(),

            predexp.integerBin('epochDay'),
            predexp.integerValue(epochDay),
            predexp.integerLessEq(),

            predexp.and(3)
        ]

Not able to understand what is wrong here, help needed.

My Config is:

namespace test {
        replication-factor 2
        memory-size 8G
        default-ttl 7d 
        storage-engine device {
                device /dev/xvdf
                scheduler-mode noop
                write-block-size 16K
                data-in-memory false
        }
}

Indexes are:

CREATE INDEX campaignIndex ON antiSpamming.userTargetingMatrix (campaignKey) string;
CREATE INDEX targetIndex ON antiSpamming.userTargetingMatrix (targetKey) string;
CREATE INDEX epochDayIndex ON antiSpamming.userTargetingMatrix (epochDay) NUMERIC;

Solution

  • First thing, that's not true at all. Aerospike reads are always going to be faster than writes. To perform a write there's a longer code path and more IO. Unless you are stating that your operation is a REPLACE it will behave as an upsert, meaning that it will first try to read the same record, merge your data in, then write it out.

    What you are doing above isn't comparing apples to apples. A write (put) is a single record operation. You should compare a write to a single record read (get). What you're doing is a scan (if you also attach a secondary index filter it would be a query), which is a multi-node operation. Even if it just returns a single record, it has to go to all the nodes, and in each walk the entire primary index for matches to your predicate filter.

    There are a few ways to get around that. For one, you can build a secondary index on your epochDay value, and instead of a predicate filter use a secondary index filter with the BETWEEN range predicate. The predicate filter would be smaller, just the string predicate.

    Second, you could use a modeling approach where the data is consolidated in a single larger record as a list or map, and you use the list or map API to get the range of elements you want in that complex data type. Take a look at the Aerospike developer blog and Aerospike code examples.