Search code examples
singlestore

memsql-deploy leaf node consistently failed


On the same host as master to memsql-deploy leaf node always failed with same error. Switching the operation to new machines has the same failure.

Here is the steps to deploy master role:

# memsql-ops memsql-deploy -a Af53bfb  -r master -P 3306 --community-edition
2017-03-24 16:15:54: Je5725b [INFO] Deploying MemSQL to 172.17.0.3:3306
2017-03-24 16:15:59: Je5725b [INFO] Installing MemSQL
2017-03-24 16:16:02: Je5725b [INFO] Finishing MemSQL Install
Waiting for MemSQL to start...
MemSQL successfully started

Here is the immediate steps to add leaf node after deploying master:

# memsql-ops memsql-deploy -a Af53bfb  -r leaf -P 3308       
2017-03-24 16:16:43: J32c71f [INFO] Deploying MemSQL to 172.17.0.3:3308
2017-03-24 16:16:43: J32c71f [INFO] Installing MemSQL
2017-03-24 16:16:46: J32c71f [INFO] Finishing MemSQL Install
Waiting for MemSQL to start...
MemSQL failed to start: Failed to start MemSQL:

        set_mempolicy: Operation not permitted
setting membind: Operation not permitted

What can be the possible reasons behind the error messages and what way that I can follow to find out the root cause or fix?


Solution

  • After one day search on Google, I believe I finally locate the root cause of this error. I feel strange why no one asked before because it should be happened more often than just me.

    The real cause for this issue is I installed numactl package per MemSQL's best practice suggestion on a non-NUMA machine. This would effectively let the memsql node other than the first one try to run numactl sub-command set_mempolicy to bind individual MemSQL nodes to CPUs but this command would eventually fails. And the start of the node by sub-commands memsql-start or memsql-deploy from memsql-ops will all fail.

    The workaround to this is very simple, just remove the package numactl. Then everything will be fine. This workaround particularly applies to some virtualization based memsql deployments like Docker.