Search code examples
aerospike

Aerospike Hot Key error


Based on this link, I understood that hotkey error happens when there are too many concurrent operation requests for the same key.

My current scenario:

I have a record which will get updated in every 5-10 seconds interval and I will have around 20 machines each with 10K Queries per second tries to read that record

  • Question 1 : Hotkey error will happen only when there are concurrent update transactions or it can happen for concurrent reads also?
  • Question 2 : The transaction-pending-limit mentioned in the above link is it per node in the cluster or for the overall cluster limit?
  • Question 3 : Based on my reading we should not increase transaction-pending-limit because it will impact performance, can you tell me some performance numbers to compare? And what is the maximum value that can be used for transaction-pending-limit?
  • Question 4 : Is there any workaround for my scenario without impacting the performance other than caching the record in the server?

Solution

  • 1- Both reads/updates.

    2- Per node. All transactions will go to node holding the master partition for that record for update and for read it will also go to node holding the master partition for that record, unless if you have a client policy to also read from node holding replica(s) partition.

    3- Hard to give numbers. It will cause more client connections to the nodes where the hotkey is, which in turn can degrade performance, depending on the setup.

    4- Easiest, if use case permits, would be to use the read replica client policy to mitigate the reads across master and replica partitions. Otherwise, create multiple keys.