So, I have two Elasticsearch 7.10 clusters.
I have a Cognito user pool, with an admin group. This admin group a role attached to it from IAM, call it the AdminRole. It's precedence is 1.
Now, I have configured both of the aforementioned Elasticsearch clusters to utilize Cognito authentication. They both use the same user pool, and the same identity pool.
That being said, when I log into the older cluster, click in the top right on my icon and then "view roles and identities" I see arn:aws:iam::{myaccountnumber}:role/cognito-AdminRole
.
However, whenever I try the same in the new cluster, I see arn:aws:iam::{myaccountnumber}:role/cognito-auth-role
. Why? Why is it picking up the cognito auth role instead of the role ascribed to the group?
I am logging in both times on the same account from the same cognito pool - that account is in the cognito group.
In both clusters, I have no backend roles referencing that auth role. If I add the cognito auth role as a master user (via ARN of course) then I can login fine on the newer cluster (the one that's setting my backend role to cognito-auth-role). When I remove the cognito-auth-role as a backend role from the all_access and security_manager roles, I stop being able to login to that cluster, with the fabled "missing role" error.
In both cases, the cognito admin group ARN stays as a backend role for all_access and security_manager.
In other words - how do I force the cluster to try to assume me the arn:aws:iam::{myaccountnumber}:role/cognito-AdminRole
instead of the arn:aws:iam::{myaccountnumber}:role/cognito-auth-role
? It's clearly possible, since the group's role is automatically assumed when I try to log in to the old cluster.
So - Genuinely, I have struggled with this for the last 40 hours at work, while doing other things.
About 35 minutes after posting to Stack Overflow, I found the answer.
The old cluster had a Authentication Role Selection rule in the identity pool. "Choose role from Token". I put the role resolution to "Deny", as it was in the old cluster, and now I have the same behaviour from both clusters.