Search code examples

Elasticsearch pyspark connection in insecure mode

My end goal is to insert data from hdfs to elasticsearch but the issue i am facing is the connectivity

I am able to connect to my elasticsearch node using below curl command

curl -u username -X GET' --insecure

but when it comes to connection with spark I am unable to do so. My command to insert data is df.write.mode("append").format('org.elasticsearch.spark.sql').option("", "username").option("", "password").option("","true").option('es.nodes', '').option('es.port','9200').save('my-index/my-doctype')

Error i am getting is

org.elastisearch.hadoop.EsHadoopIllegalArgumentException:Cannot detect ES version - typical this happens if then network/Elasticsearch cluster is not accessible or when targetting a Wan/Cloud instance without the proper setting 'es.nodes.wan.only'
Caused by: Connection error (check network and/or proy settings)- all nodes failed; tried [[]]

Here, What would be the pyspark equivalent of curl --insecure



  • After many attempt and different config options. I found a way how to connect elastisearch running on https insecurely

            dfToEs.write.mode("append").format('org.elasticsearch.spark.sql') \
            .option("", username) \
            .option("", password) \
            .option("", "true") \
            .option("", "true") \
            .option("mergeSchema", "true") \
            .option('', 'true') \
            .option('es.nodes', 'https://{}'.format(es_ip)) \
            .option('es.port', '9200') \
            .option('es.batch.write.retry.wait', '100s') \

    with the

    (, true)

    We also have to provide self signed certificate like below

    (, true)