Search code examples
apache-sparkelasticsearchibm-cloudcompose-dbanalytics-engine

Could not get a Transport from the Transport Pool for host


I'm trying to write to an IBM Compose Elasticsearch sink from Spark Structured Streaming on IBM Analytics Engine. My spark code:

dataDf  
  .writeStream
  .outputMode(OutputMode.Append) 
  .format("org.elasticsearch.spark.sql")
  .queryName("ElasticSink")
  .option("checkpointLocation", s"${s3Url}/checkpoint_elasticsearch")
  .option("es.nodes", "xxx1.composedb.com,xxx2.composedb.com")
  .option("es.port", "xxxx")
  .option("es.net.http.auth.user", "admin")
  .option("es.net.http.auth.pass", "xxxx")
  .option("es.net.ssl", true)
  .option("es.nodes.wan.only", true)
  .option("es.net.ssl.truststore.location", SparkFiles.getRootDirectory() + "/my.jks")
  .option("es.net.ssl.truststore.pass", "xxxx")
  .start("test/broadcast")

However, I'm receiving the following exception:

 org.elasticsearch.hadoop.EsHadoopException: Could not get a Transport from the Transport Pool for host [xxx2.composedb.com:xxxx]
    at org.elasticsearch.hadoop.rest.pooling.PooledHttpTransportFactory.borrowFrom(PooledHttpTransportFactory.java:106)
    at org.elasticsearch.hadoop.rest.pooling.PooledHttpTransportFactory.create(PooledHttpTransportFactory.java:55)
    at org.elasticsearch.hadoop.rest.NetworkClient.selectNextNode(NetworkClient.java:99)
    at org.elasticsearch.hadoop.rest.NetworkClient.<init>(NetworkClient.java:82)
    at org.elasticsearch.hadoop.rest.NetworkClient.<init>(NetworkClient.java:59)
    at org.elasticsearch.hadoop.rest.RestClient.<init>(RestClient.java:94)
    at org.elasticsearch.hadoop.rest.InitializationUtils.discoverEsVersion(InitializationUtils.java:317)
    at org.elasticsearch.hadoop.rest.RestService.createWriter(RestService.java:576)
    at org.elasticsearch.spark.rdd.EsRDDWriter.write(EsRDDWriter.scala:58)
    at org.elasticsearch.spark.sql.streaming.EsStreamQueryWriter.run(EsStreamQueryWriter.scala:41)
    at org.elasticsearch.spark.sql.streaming.EsSparkSqlStreamingSink$$anonfun$addBatch$2$$anonfun$2.apply(EsSparkSqlStreamingSink.scala:52)
    at org.elasticsearch.spark.sql.streaming.EsSparkSqlStreamingSink$$anonfun$addBatch$2$$anonfun$2.apply(EsSparkSqlStreamingSink.scala:51)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
    at org.apache.spark.scheduler.Task.run(Task.scala:109)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)

Any ideas?


Solution

  • I modified the Elasticsearch hadoop library to output the exception and the underlying problem was the truststore not being found:

    org.elasticsearch.hadoop.EsHadoopIllegalStateException: Cannot initialize SSL - Expected to find keystore file at [/tmp/spark-e2203f9c-4f0f-4929-870f-d491fce0ad06/userFiles-62df70b0-7b76-403d-80a1-8845fd67e6a0/my.jks] but was unable to. Make sure that it is available on the classpath, or if not, that you have specified a valid URI.
        at org.elasticsearch.hadoop.rest.pooling.PooledHttpTransportFactory.borrowFrom(PooledHttpTransportFactory.java:106)