I have an AWS Glue job (Glue version 2 in Python 3) which used to load data into an Elasticsearch cluster hosted on EC2 instances. The connection was made with a dependent JAR (elasticsearch-spark-20_2.11-7.8.1.jar). We have now moved to a managed Opensearch 1.2 cluster (HTTPS required, does not have fine-grained access enabled) and I'm trying to figure out how to connect to this new cluster with Glue. The OS cluster is in the private VPC which the glue job role has access to. I have also provided the glue role full access to the OS service for testing purposes. I have tried:
'org.elasticsearch.spark.sql'
).mode(
'overwrite'
).option(
'es.nodes', 'full_https_endpoint'
).option(
'es.port', 443
).option(
'es.resource', '%s' % ('index_name'),
).option(
'es.nodes.wan.only', True
).save()
but I get the error "An error occurred while calling o328.save. Cannot detect ES version - typically this happens if the network/Elasticsearch cluster is not accessible or when targeting a WAN/Cloud instance without the proper setting 'es.nodes.wan.only'"
ElasticsearchConnector7134forAWSGlue10and20_node1658268217103 = glueContext.write_dynamic_frame.from_options(
frame=dynamicFrame_fin,
connection_type="marketplace.spark",
connection_options={
"path": "index_name",
"es.nodes.wan.only": "true",
"es.nodes": "full_https_endpoint",
"es.port": "443",
"connectionName": "opensearch_dev",
},
transformation_ctx="ElasticsearchConnector7134forAWSGlue10and20_node1658268217103",
)
but get a similar error of "An error occurred while calling o323.pyWriteDynamicFrame. Cannot detect ES version - typically this happens if the network/Elasticsearch cluster is not accessible or when targeting a WAN/Cloud instance without the proper setting 'es.nodes.wan.only'"
Questions:
This also tripped me up for days. When you created the OpenSearch cluster, did you check "enable compatibility mode"?
Without this mode enabled, if you hit your domains endpoint to retrieve the version, you'll get back 1.2.0 which the driver you've wired up isn't expecting, and it will fail in the same error you've posted.
When you enable compatibility mode, it will report back the version number as something your driver can understand.
Example with compatibility turned on:
"version": {
"number": "7.10.2",
}
The rest of your setup looks good, so hopefully this is what's blocking you.