Search code examples
amazon-web-servicesamazon-iamaws-glue

aws glue IAM role cant connect to aws opensearch


I have a Glue job to push data into AWS OpenSearch. Everythings works perfectly when I have an "open" permission on OpenSearch, for example:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "AWS": "*"
      },
      "Action": "es:*",
      "Resource": "arn:aws:es:<region>:<accountId>:domain/<domain>/*"
    }
  ]
}

This works without issue. The problem is I want to secure my OpenSearch domain to only the role running the glue job.

I attempted to do that starting basic with:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "AWS": [
          "arn:aws:iam::<accountId>:role/AWSGluePowerUser"
        ]
      },
      "Action": [
        "*"
      ],
      "Resource": [
        "*"
      ]
    }
  ]
}

This disables all access to OpenSearch which I want, however it also blocks it for Glue even though the jobs a running with the AWSGluePowerUser role set.

An error occurred while calling o805.pyWriteDynamicFrame. Cannot detect ES version - typically this happens if the network/Elasticsearch cluster is not accessible or when targeting a WAN/Cloud instance without the proper setting 'es.nodes.wan.only'

Which I assume is because the Glue job can no longer see the OpenSearch cluster. Keep in mind everything works when using the "default" access policy for OpenSearch.

I have my glue job configured to use the IAM role AWSGluePowerUser which also has AmazonOpenSearchServiceFullAccess policy attached.

I'm not sure where I've gone wrong here?

Edit: Here is where/how I've set the roles for the Glue job, I assume this is all I needed to do?

From Glue Job Details enter image description here


Solution

  • I believe this is not possible because the AWS Glue Elasticsearch connector is based on an open-source Elasticsearch Spark library that doest not sign requests using AWS Signature Version 4 which is required for enforcing domain access policies.

    If you take a look at the key concepts for fine-grained access control in OpenSearch, you'll see:

    If you choose IAM for your master user, all requests to the cluster must be signed using AWS Signature Version 4.

    If you visit the Elasticsearch Connector for AWS Glue AWS Marketplace page, you'll notice that the connector itself is based on an open-source implementation:

    For more details about this open-source Elasticsearch spark connector, please refer to this open-source connector online reference

    Under the hood, AWS Glue is using this library to index data from Spark dataframes to the Elasticsearch endpoint. Since this open-source library (maintained by the Elasticsearch community) does not have support for signing requests using using AWS Signature Version 4, it will only work with the "open permission" you've referenced. This is hinted at in the big picture on fine-grained access control:

    In general, if you enable fine-grained access control, we recommend using a domain access policy that doesn't require signed requests.

    Note that you can always fall back us using a master user based on username/password:

    1. Create a master user (username/password) for the OpenSearch domain's fine-grained access control configuration.
    2. Store the username/password in an AWS Secrets Manager secret as described here.
    3. Attach the secret to the AWS Glue connector as described here.

    Hope this helps!