I have Spring Boot App with three Elasticsearch clusters (ES v6.4.2) configured. The application.properties file looks like the following (I have three master nodes configured for every cluster but display one here for simplicity):
# Cluster 1
spring.data.elasticsearch.cluster-one.cluster-name=<cluster-1-name>
spring.data.elasticsearch.cluster-one.cluster-nodes=<ip-cluster-1-master-node>:9300
# Cluster 2
spring.data.elasticsearch.cluster-two.cluster-name=<cluster-2-name>
spring.data.elasticsearch.cluster-two.cluster-nodes=<ip-cluster-2-master-node>:9300
# Cluster 3
spring.data.elasticsearch.cluster-three.cluster-name=<cluster-3-name>
spring.data.elasticsearch.cluster-three.cluster-nodes=<ip-cluster-3-master-node>:9300
spring.data.elasticsearch.repositories.enabled=true
spring.autoconfigure.exclude = org.springframework.boot.autoconfigure.data.elasticsearch.ElasticsearchAutoConfiguration,org.springframework.boot.autoconfigure.data.elasticsearch.ElasticsearchDataAutoConfiguration
For every cluster I have a separate configuration class where I setup the TransportClient and the ElasticsearchTemplate.
Now when I startup the app locally with all three clusters running on my local machine, the app starts up normal. But when I deploy the app to my test environment using three separate remote clusters the startup process takes 20 minutes. It hangs on loading the Elasticsearch plugins for the third cluster it seems. Here an excerpt from the log output:
2019-09-10 00:55:57.607 INFO 27505 --- [ main] o.s.web.context.ContextLoader : Root WebApplicationContext: initialization completed in 2897 ms
2019-09-10 00:55:57.971 INFO 27505 --- [ main] o.elasticsearch.plugins.PluginsService : no modules loaded
2019-09-10 00:55:57.972 INFO 27505 --- [ main] o.elasticsearch.plugins.PluginsService : loaded plugin [org.elasticsearch.index.reindex.ReindexPlugin]
2019-09-10 00:55:57.972 INFO 27505 --- [ main] o.elasticsearch.plugins.PluginsService : loaded plugin [org.elasticsearch.join.ParentJoinPlugin]
2019-09-10 00:55:57.972 INFO 27505 --- [ main] o.elasticsearch.plugins.PluginsService : loaded plugin [org.elasticsearch.percolator.PercolatorPlugin]
2019-09-10 00:55:57.972 INFO 27505 --- [ main] o.elasticsearch.plugins.PluginsService : loaded plugin [org.elasticsearch.script.mustache.MustachePlugin]
2019-09-10 00:55:57.973 INFO 27505 --- [ main] o.elasticsearch.plugins.PluginsService : loaded plugin [org.elasticsearch.transport.Netty4Plugin]
2019-09-10 00:55:59.785 INFO 27505 --- [ main] o.elasticsearch.plugins.PluginsService : no modules loaded
2019-09-10 00:55:59.785 INFO 27505 --- [ main] o.elasticsearch.plugins.PluginsService : loaded plugin [org.elasticsearch.index.reindex.ReindexPlugin]
2019-09-10 00:55:59.785 INFO 27505 --- [ main] o.elasticsearch.plugins.PluginsService : loaded plugin [org.elasticsearch.join.ParentJoinPlugin]
2019-09-10 00:55:59.785 INFO 27505 --- [ main] o.elasticsearch.plugins.PluginsService : loaded plugin [org.elasticsearch.percolator.PercolatorPlugin]
2019-09-10 00:55:59.785 INFO 27505 --- [ main] o.elasticsearch.plugins.PluginsService : loaded plugin [org.elasticsearch.script.mustache.MustachePlugin]
2019-09-10 00:55:59.786 INFO 27505 --- [ main] o.elasticsearch.plugins.PluginsService : loaded plugin [org.elasticsearch.transport.Netty4Plugin]
2019-09-10 01:18:30.484 INFO 27505 --- [ main] o.elasticsearch.plugins.PluginsService : no modules loaded
2019-09-10 01:18:30.485 INFO 27505 --- [ main] o.elasticsearch.plugins.PluginsService : loaded plugin [org.elasticsearch.index.reindex.ReindexPlugin]
2019-09-10 01:18:30.485 INFO 27505 --- [ main] o.elasticsearch.plugins.PluginsService : loaded plugin [org.elasticsearch.join.ParentJoinPlugin]
2019-09-10 01:18:30.485 INFO 27505 --- [ main] o.elasticsearch.plugins.PluginsService : loaded plugin [org.elasticsearch.percolator.PercolatorPlugin]
2019-09-10 01:18:30.485 INFO 27505 --- [ main] o.elasticsearch.plugins.PluginsService : loaded plugin [org.elasticsearch.script.mustache.MustachePlugin]
2019-09-10 01:18:30.485 INFO 27505 --- [ main] o.elasticsearch.plugins.PluginsService : loaded plugin [org.elasticsearch.transport.Netty4Plugin]
Here you see the delay of more than 20 minutes between the second and third block loading the plugins.
When curling the clusters from the test environment they are all reachable and respond without delay.
What can be the reason for the delay or where do I have to look?
Is it possible or maybe recommended to load the Elasticsearch plugins only once for all three clusters and if yes, how can I achieve this?
EDIT:
DEBUG logs show me, that the master nodes can't connect to the data nodes:
org.elasticsearch.transport.ConnectTransportException: [data_node_6][<ip-of-data-node>:9300] connect_exception
[...]
2019-09-10 18:49:00.517 DEBUG 26219 --- [main] o.e.c.t.TransportClientNodesService : failed to connect to discovered node [{data_node_6}{LKdxInfLSyqrGgSOXvTwFw}{YIhin3kpSNupEY1jBlHFVg}{<ip-of-data-node>}{<ip-of-data-node>:9300}{ml.machine_memory=33422729216, ml.max_open_jobs=20, xpack.installed=true, ml.enabled=true}]
But my Cluster is online and in green state with all data nodes present. All nodes are configured as a mesh VPN with ports 9200 and 9300 open for communication between the nodes.
Does ES need another port to be open for communication?
The problem occured because I enabled cluster sniffing (https://www.elastic.co/guide/en/elasticsearch/client/java-api/current/transport-client.html) which picks up all data nodes in the cluster and communicates directly with them instead via the master nodes.
Since my cluster is configured as a VPN and only the master nodes can be reached from the backend (which is outside the VPN), the backend cannot communicate with the data nodes when gets the internal VPN IPs (which are not public IPs) from the master nodes, hence the connection failure.
So I disabled the cluster sniffing and everything works now as expected.