We are running 3 Node Cluster, data in memory on version 4.2.0.4 CE on AWS. We recently noticed writes are not happening and found one down. Ideally write should happen. Once we start the node which was down, the writes resumed. We are accessing the Aerospike cluster from outside the AWS.
Found below INFO Logs being printed continuously on two nodes.
INFO (hb): (hb.c:4319) found redundant connections to same node, fds 101 31 - choosing at random
On the other node, no logs being printed and no read/writes happening on asadm stats. Also we have observed that the records are unevenly distributed across the nodes.
Below is the configuration file network section consistent across all servers.
The network stanza for all 3 servers are consistent. Please find below.
network {
service {
address any
port 3000
}
heartbeat {
mode mesh
port 3002 # Heartbeat port for this node.
# List one or more other nodes, one ip-address & port per line:
mesh-seed-address-port 13.xxx.xxx.xxx 3002
mesh-seed-address-port 13.xxx.xxx.xxx 3002
mesh-seed-address-port 13.xxx.xxx.xxx 3002
interval 150
timeout 10
}
fabric {
port 3001
}
info {
port 3003
}
}
namespace smpa {
replication-factor 2
memory-size 12G
storage-engine memory
single-bin true
high-water-memory-pct 80
stop-writes-pct 90
}
$ asadm -e "show stat like stop_writes"
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~Service Statistics (2019-01-24 12:24:42 UTC)~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
NODE : node5.domain.com:3000 node6.domain.com:3000 node7.domain.com:3000
cluster_clock_skew_stop_writes_sec: 0 0 0
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~smpa Namespace Statistics (2019-01-24 12:24:42 UTC)~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
NODE : node5.domain.com:3000 node6.domain.com:3000 node7.domain.com:3000
clock_skew_stop_writes: false false false
stop_writes : false false false
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~test Namespace Statistics (2019-01-24 12:24:42 UTC)~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
NODE : node5.domain.com:3000 node6.domain.com:3000 node7.domain.com:3000
clock_skew_stop_writes: false false false
stop_writes : false false false
$ asadm -e "show stat like x_partitions"
Seed: [('127.0.0.1', 3000, None)]
Config_file: /home/web/.aerospike/astools.conf, /etc/aerospike/astools.conf
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~smpa Namespace Statistics (2019-01-24 12:30:01 UTC)~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
NODE : node5.domain.com:3000 node6.domain.com:3000 node7.domain.com:3000
migrate_rx_partitions_active : 0 0 0
migrate_rx_partitions_initial : 0 2749 0
migrate_rx_partitions_remaining: 0 0 0
migrate_tx_partitions_active : 0 0 0
migrate_tx_partitions_imbalance: 0 0 0
migrate_tx_partitions_initial : 1396 0 1353
migrate_tx_partitions_remaining: 0 0 0
$ asadm -e "show pmap"
Seed: [('127.0.0.1', 3000, None)]
Config_file: /home/web/.aerospike/astools.conf, /etc/aerospike/astools.conf
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~Partition Map Analysis (2019-01-24 12:33:39 UTC)~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cluster Namespace Node Primary Secondary Dead Unavailable
Key . . Partitions Partitions Partitions Partitions
BEF4A1479187 smpa node6.domain.com:3000 1382 1367 0 0
BEF4A1479187 smpa node7.domain.com:3000 1358 1342 0 0
BEF4A1479187 smpa node5.domain.com:3000 1356 1387 0 0
BEF4A1479187 test node6.domain.com:3000 1382 0 0 0
BEF4A1479187 test node7.domain.com:3000 1358 0 0 0
BEF4A1479187 test node5.domain.com:3000 1356 0 0 0
Number of rows: 6
$ asadm -e "show stat like objects"
Seed: [('127.0.0.1', 3000, None)]
Config_file: /home/web/.aerospike/astools.conf, /etc/aerospike/astools.conf
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~Service Statistics (2019-01-24 12:34:09 UTC)~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
NODE : node5.domain.com:3000 node6.domain.com:3000 node7.domain.com:3000
objects : 6478039 6485049 9265180
sindex_gc_objects_validated: 0 0 0
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~smpa Namespace Statistics (2019-01-24 12:34:09 UTC)~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
NODE : node5.domain.com:3000 node6.domain.com:3000 node7.domain.com:3000
evicted_objects : 0 0 0
expired_objects : 0 0 0
master_objects : 2944752 3456686 4712696
non_expirable_objects: 2943325 3455765 4711880
non_replica_objects : 0 0 0
objects : 6478039 6485049 9265180
prole_objects : 3533287 3028363 4552484
$ asadm -e "info"
Seed: [('127.0.0.1', 3000, None)]
Config_file: /home/web/.aerospike/astools.conf, /etc/aerospike/astools.conf
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~Network Information (2019-01-25 06:54:14 UTC)~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Node Node Ip Build Cluster Migrations Cluster Cluster Principal Client Uptime
. Id . . Size . Key Integrity . Conns .
ec2-xx-xxx-xxx-xxx.ap-south-1.compute.amazonaws.com:3000 BB9BE0093E32B0A xx.xxx.xxx.xxx:3000 C-4.2.0.4 3 0.000 3ADA511969DD True BB9EAC87115AD0A 59 01:09:24
ec2-xx-xxx-xxx-xxx.ap-south-1.compute.amazonaws.com:3000 *BB9EAC87115AD0A xx.xxx.xxx.xxx:3000 C-4.2.0.4 3 0.000 3ADA511969DD True BB9EAC87115AD0A 59 01:05:17
ec2-xx-xxx-xxx-xxx.ap-south-1.compute.amazonaws.com:3000 BB9D4175485B10A xx.xxx.xxx.xxx:3000 C-4.2.0.4 3 0.000 3ADA511969DD True BB9EAC87115AD0A 59 01:14:17
Number of rows: 3
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~Namespace Usage Information (2019-01-25 06:54:14 UTC)~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Namespace Node Total Expirations,Evictions Stop Disk Disk HWM Avail% Mem Mem HWM Stop
. . Records . Writes Used Used% Disk% . Used Used% Mem% Writes%
smpa ec2-xx-xxx-xxx-xxx.ap-south-1.compute.amazonaws.com:3000 2.716 M (0.000, 0.000) false N/E N/E 50 N/E 2.774 GB 24 80 90
smpa ec2-xx-xxx-xxx-xxx.ap-south-1.compute.amazonaws.com:3000 2.648 M (0.000, 0.000) false N/E N/E 50 N/E 2.706 GB 23 80 90
smpa ec2-xx-xxx-xxx-xxx.ap-south-1.compute.amazonaws.com:3000 2.709 M (0.000, 0.000) false N/E N/E 50 N/E 2.767 GB 24 80 90
smpa 8.074 M (0.000, 0.000) 0.000 B 8.247 GB
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~Namespace Object Information (2019-01-25 06:54:14 UTC)~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Namespace Node Total Repl Objects Tombstones Pending Rack
. . Records Factor (Master,Prole,Non-Replica) (Master,Prole,Non-Replica) Migrates ID
. . . . . . (tx,rx) .
smpa ec2-xx-xxx-xxx-xxx.ap-south-1.compute.amazonaws.com:3000 2.716 M 2 (1.375 M, 1.341 M, 0.000) (0.000, 0.000, 0.000) (0.000, 0.000) 0
smpa ec2-xx-xxx-xxx-xxx.ap-south-1.compute.amazonaws.com:3000 2.648 M 2 (1.311 M, 1.337 M, 0.000) (0.000, 0.000, 0.000) (0.000, 0.000) 0
smpa ec2-xx-xxx-xxx-xxx.ap-south-1.compute.amazonaws.com:3000 2.709 M 2 (1.351 M, 1.359 M, 0.000) (0.000, 0.000, 0.000) (0.000, 0.000) 0
smpa 8.074 M (4.037 M, 4.037 M, 0.000) (0.000, 0.000, 0.000) (0.000, 0.000)
$ asadm -e "show stat like objects"
Seed: [('127.0.0.1', 3000, None)]
Config_file: /home/web/.aerospike/astools.conf, /etc/aerospike/astools.conf
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~smpa d190122 Set Statistics (2019-01-25 07:07:30 UTC)~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
NODE : ec2-xx.xxx.xxx.xxx.ap-south-1.compute.amazonaws.com:3000 ec2-xx.xxx.xxx.xxx.ap-south-1.compute.amazonaws.com:3000 ec2-xx.xxx.xxx.xxx.ap-south-1.compute.amazonaws.com:3000
objects: 672400 662491 671131
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~smpa d190121 Set Statistics (2019-01-25 07:07:30 UTC)~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
NODE : ec2-xx.xxx.xxx.xxx.ap-south-1.compute.amazonaws.com:3000 ec2-xx.xxx.xxx.xxx.ap-south-1.compute.amazonaws.com:3000 ec2-xx.xxx.xxx.xxx.ap-south-1.compute.amazonaws.com:3000
objects: 376064 347232 374700
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~smpa d190124 Set Statistics (2019-01-25 07:07:30 UTC)~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
NODE : ec2-xx.xxx.xxx.xxx.ap-south-1.compute.amazonaws.com:3000 ec2-xx.xxx.xxx.xxx.ap-south-1.compute.amazonaws.com:3000 ec2-xx.xxx.xxx.xxx.ap-south-1.compute.amazonaws.com:3000
objects: 629323 617983 628214
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~smpa d190123 Set Statistics (2019-01-25 07:07:30 UTC)~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
NODE : ec2-xx.xxx.xxx.xxx.ap-south-1.compute.amazonaws.com:3000 ec2-xx.xxx.xxx.xxx.ap-south-1.compute.amazonaws.com:3000 ec2-xx.xxx.xxx.xxx.ap-south-1.compute.amazonaws.com:3000
objects: 739556 726447 736871
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~smpa d190125 Set Statistics (2019-01-25 07:07:30 UTC)~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
NODE : ec2-xx.xxx.xxx.xxx.ap-south-1.compute.amazonaws.com:3000 ec2-xx.xxx.xxx.xxx.ap-south-1.compute.amazonaws.com:3000 ec2-xx.xxx.xxx.xxx.ap-south-1.compute.amazonaws.com:3000
objects: 313800 308814 313320
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~Service Statistics (2019-01-25 07:07:30 UTC)~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
NODE : ec2-xx.xxx.xxx.xxx.ap-south-1.compute.amazonaws.com:3000 ec2-xx.xxx.xxx.xxx.ap-south-1.compute.amazonaws.com:3000 ec2-xx.xxx.xxx.xxx.ap-south-1.compute.amazonaws.com:3000
objects : 2731143 2662967 2724236
sindex_gc_objects_validated: 0 0 0
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~smpa Namespace Statistics (2019-01-25 07:07:30 UTC)~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
NODE : ec2-xx.xxx.xxx.xxx.ap-south-1.compute.amazonaws.com:3000 ec2-xx.xxx.xxx.xxx.ap-south-1.compute.amazonaws.com:3000 ec2-xx.xxx.xxx.xxx.ap-south-1.compute.amazonaws.com:3000
evicted_objects : 0 0 0
expired_objects : 0 0 0
master_objects : 1382413 1318579 1358181
non_expirable_objects: 1382525 1318691 1358445
non_replica_objects : 0 0 0
objects : 2731143 2662967 2724236
prole_objects : 1348730 1344388 1366055
The issue is, I have provided NATed ips for heartbeat communication. Ideally we need to provide private IP for "mesh-seed-address-port", provided the "access-address" to NATed IP if your client is outside the network. Please go through the above threads if required.
Here is the clear documentation on how to configure on AWS EC2 instances. https://discuss.aerospike.com/t/aws-ec2-ip-addressing-for-aerospike/2424
Thanks a lot to kporter, pgupta & ashish-shinde for their valuable help.