One of my Sql nodes failed wiht "too many connection error", When reviewing the status of the cluster with ndb_mgm, I realize that data node 2 was shutdown, so I restart it and restart the sql node. after 2 hours "starting" data node 2, it came up, but now when I issue this query select count(*) from summantions, the first time I get, maybe 107,000,000 rows, but if I run the query again, now it shows about 90,000,000.
The memory usage is as follows:
Node Memory Used (GB) Available (GB)
2 Data memory 39.16 40.84
2 Index memory 3.28 2.06
2 Long message buffer 0.00 0.03
3 Data memory 44.92 35.08
3 Index memory 3.76 1.58
3 Long message buffer 0.00 0.03
The used memory in data node 2 is less than the memory used in data node 3.
My guess is that in the first query run, I get the result from one data node and in the second run, I get it from the other data node.
If this is the case, how can I sync data nodes merging their data?
Sounds very much like a bug. To synchronize a node that you think is corrupt is handled by performing an initial node restart. This will clean the state in the node and it will synchronize with the other node that kept its data.
But obviously important to verify that things are corrupt first.