I have a Glue job to push data into AWS OpenSearch. Everythings works perfectly when I have an "open" permission on OpenSearch, for example:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"AWS": "*"
},
"Action": "es:*",
"Resource": "arn:aws:es:<region>:<accountId>:domain/<domain>/*"
}
]
}
This works without issue. The problem is I want to secure my OpenSearch domain to only the role running the glue job.
I attempted to do that starting basic with:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"AWS": [
"arn:aws:iam::<accountId>:role/AWSGluePowerUser"
]
},
"Action": [
"*"
],
"Resource": [
"*"
]
}
]
}
This disables all access to OpenSearch which I want, however it also blocks it for Glue even though the jobs a running with the AWSGluePowerUser
role set.
An error occurred while calling o805.pyWriteDynamicFrame. Cannot detect ES version - typically this happens if the network/Elasticsearch cluster is not accessible or when targeting a WAN/Cloud instance without the proper setting 'es.nodes.wan.only'
Which I assume is because the Glue job can no longer see the OpenSearch cluster. Keep in mind everything works when using the "default" access policy for OpenSearch.
I have my glue job configured to use the IAM role AWSGluePowerUser
which also has AmazonOpenSearchServiceFullAccess
policy attached.
I'm not sure where I've gone wrong here?
Edit: Here is where/how I've set the roles for the Glue job, I assume this is all I needed to do?
I believe this is not possible because the AWS Glue Elasticsearch connector is based on an open-source Elasticsearch Spark library that doest not sign requests using AWS Signature Version 4 which is required for enforcing domain access policies.
If you take a look at the key concepts for fine-grained access control in OpenSearch, you'll see:
If you choose IAM for your master user, all requests to the cluster must be signed using AWS Signature Version 4.
If you visit the Elasticsearch Connector for AWS Glue AWS Marketplace page, you'll notice that the connector itself is based on an open-source implementation:
For more details about this open-source Elasticsearch spark connector, please refer to this open-source connector online reference
Under the hood, AWS Glue is using this library to index data from Spark dataframes to the Elasticsearch endpoint. Since this open-source library (maintained by the Elasticsearch community) does not have support for signing requests using using AWS Signature Version 4, it will only work with the "open permission" you've referenced. This is hinted at in the big picture on fine-grained access control:
In general, if you enable fine-grained access control, we recommend using a domain access policy that doesn't require signed requests.
Note that you can always fall back us using a master user based on username/password:
Hope this helps!