I have a running cluster and I would like to add a data node into it. The running cluster is
x.x.x.246
and the data node is
x.x.x.99
each server can see each other by ping. Machine OS: CentOS7 Elasticsearch: 7.61
here is elasticsearch.yml of x.x.x.246:
cluster.name: elasticsearch
node.master: true
node.name: Node_master
node.data: true
path.data: /var/lib/elasticsearch
path.logs: /var/log/elasticsearch
network.host: x.x.x.246
http.port: 9200
discovery.seed_hosts: ["x.x.x.99:9300"]
cluster.initial_master_nodes: ["x.x.x.246:9300"]
here is elasticsearch.yml of x.x.x.99
cluster.name: elasticsearch
node.name: Node_master
node.data: true
path.data: /var/lib/elasticsearch
path.logs: /var/log/elasticsearch
network.host: x.x.x.99
http.port: 9200
discovery.seed_hosts: ["x.x.x.245:9300"]
cluster.initial_master_nodes: ["x.x.x.246:9300"]
When I run systemctl start elasticsearch
on each machine, it works well.
curl -X GET "X.X.X.246:9200/_cluster/health?pretty"
show:number of the node not changing
curl -X GET "X.X.X.99:9200/_cluster/health?pretty
show:
{
"error" : {
"root_cause" : [
{
"type" : "master_not_discovered_exception",
"reason" : null
}
],
"type" : "master_not_discovered_exception",
"reason" : null
},
"status" : 503
}
here is elasticsearch.yml of x.x.x.246:
cluster.name: elasticsearch
node.name: master
node.master: true
node.data: true
path.data: /var/lib/elasticsearch
path.logs: /var/log/elasticsearch
network.host: 0.0.0.0
http.port: 9200
discovery.seed_hosts: ["x.x.x.99","x.x.x.246]
cluster.initial_master_nodes: ["x.x.x.246"]
logger.org.elasticsearch.discovery: TRACE
here is elasticsearch.yml of x.x.x.99
cluster.name: elasticsearch
node.name: node
node.data: true
path.data: /var/lib/elasticsearch
path.logs: /var/log/elasticsearch
network.host: 0.0.0.0
http.port: 9200
discovery.seed_hosts: ["x.x.x.246","x.x.x.99"]
cluster.initial_master_nodes: ["x.x.x.246"]
logger.org.elasticsearch.discovery: TRACE
log on x.x.x.99:
[root@dev ~]# tail -30 /var/log/elasticsearch/elasticsearch.log
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:692) ~[elasticsearch-7.6.1.jar:7.6.1]
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[elasticsearch-7.6.1.jar:7.6.1]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) ~[?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) ~[?:?]
at java.lang.Thread.run(Thread.java:830) ~[?:?]
[2020-03-19T12:12:04,462][INFO ][o.e.c.c.JoinHelper ] [node-1] failed to join {master}{0UHYehfNQ2-WCadTC_VVkA}{1FNy5AJrTpKOCAejBLKR2w}{10.64.2.246}{10.64.2.246:9300}{dilm}{ml.machine_memory=1907810304, ml.max_open_jobs=20, xpack.installed=true} with JoinRequest{sourceNode={node-1}{jb_3lJq1R5-BZtxlPs_NyQ}{a4TYDhG7SWqL3CSG4tusEg}{10.64.2.99}{10.64.2.99:9300}{d}{xpack.installed=true}, optionalJoin=Optional[Join{term=178, lastAcceptedTerm=8, lastAcceptedVersion=100, sourceNode={node-1}{jb_3lJq1R5-BZtxlPs_NyQ}{a4TYDhG7SWqL3CSG4tusEg}{10.64.2.99}{10.64.2.99:9300}{d}{xpack.installed=true}, targetNode={master}{0UHYehfNQ2-WCadTC_VVkA}{1FNy5AJrTpKOCAejBLKR2w}{10.64.2.246}{10.64.2.246:9300}{dilm}{ml.machine_memory=1907810304, ml.max_open_jobs=20, xpack.installed=true}}]}
org.elasticsearch.transport.RemoteTransportException: [master][10.64.2.246:9300][internal:cluster/coordination/join]
Caused by: java.lang.IllegalStateException: failure when sending a validation request to node
at org.elasticsearch.cluster.coordination.Coordinator$2.onFailure(Coordinator.java:514) ~[elasticsearch-7.6.1.jar:7.6.1]
at org.elasticsearch.action.ActionListenerResponseHandler.handleException(ActionListenerResponseHandler.java:59) ~[elasticsearch-7.6.1.jar:7.6.1]
at org.elasticsearch.transport.TransportService$ContextRestoreResponseHandler.handleException(TransportService.java:1118) ~[elasticsearch-7.6.1.jar:7.6.1]
at org.elasticsearch.transport.TransportService$ContextRestoreResponseHandler.handleException(TransportService.java:1118) ~[elasticsearch-7.6.1.jar:7.6.1]
at org.elasticsearch.transport.InboundHandler.lambda$handleException$2(InboundHandler.java:244) ~[elasticsearch-7.6.1.jar:7.6.1]
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:633) ~[elasticsearch-7.6.1.jar:7.6.1]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) ~[?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) ~[?:?]
at java.lang.Thread.run(Thread.java:830) [?:?]
Caused by: org.elasticsearch.transport.RemoteTransportException: [node-1][10.64.2.99:9300][internal:cluster/coordination/join/validate]
Caused by: org.elasticsearch.cluster.coordination.CoordinationStateRejectedException: join validation on cluster state with a different cluster uuid P4QlwvuRRGSmlT77RroSjA than local cluster uuid oUoIe2-bSbS2UPg722ud9Q, rejecting
at org.elasticsearch.cluster.coordination.JoinHelper.lambda$new$4(JoinHelper.java:148) ~[elasticsearch-7.6.1.jar:7.6.1]
at org.elasticsearch.xpack.security.transport.SecurityServerTransportInterceptor$ProfileSecuredRequestHandler$1.doRun(SecurityServerTransportInterceptor.java:257) ~[?:?]
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[elasticsearch-7.6.1.jar:7.6.1]
at org.elasticsearch.xpack.security.transport.SecurityServerTransportInterceptor$ProfileSecuredRequestHandler.messageReceived(SecurityServerTransportInterceptor.java:315) ~[?:?]
at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:63) ~[elasticsearch-7.6.1.jar:7.6.1]
at org.elasticsearch.transport.InboundHandler$RequestHandler.doRun(InboundHandler.java:264) ~[elasticsearch-7.6.1.jar:7.6.1]
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:692) ~[elasticsearch-7.6.1.jar:7.6.1]
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[elasticsearch-7.6.1.jar:7.6.1]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) ~[?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) ~[?:?]
at java.lang.Thread.run(Thread.java:830) ~[?:?]
For node x.x.x.99
the entry for seed host is wrong. It should be as below:
discovery.seed_hosts: ["x.x.x.246:9300"]
The discovery.seed_hosts
list is used to detect the master node, since this list contains the address to the nodes which are master eligible nodes and hold the information of the current master node as well, Since it is pointed to x.x.x.245
instead of x.x.x.246
in the configuration of x.x.x.99
, the node x.x.x.99
is unable to detect the master.
Post discussion in comment correct configuration should be:
Master node:
cluster.name: elasticsearch
node.name: master
node.master: true
node.data: true
path.data: /var/lib/elasticsearch
path.logs: /var/log/elasticsearch
network.host: 0.0.0.0
http.port: 9200
discovery.seed_hosts: ["x.x.x.246]
cluster.initial_master_nodes: ["master"]
Note that if you want the above node to be master only and not hold data then set
node.data: false
Data node:
cluster.name: elasticsearch
node.name: data-node-1
node.data: true
node.master: false
path.data: /var/lib/elasticsearch
path.logs: /var/log/elasticsearch
network.host: 0.0.0.0
http.port: 9200
discovery.seed_hosts: ["x.x.x.246"]
Also since node x.x.x.99
could not join cluster it has stale cluster state. So delete data
folder on x.x.x.99
and restart this node.