I have several micro-services running in AWS, some of which communicate with each other, some of them having external clients or being clients to external services.
To implement my services I need a number of secrets (RSA key pairs to sign/verify tokens, symmetric keys, API keys etc). I am using AWS SecretsManager for this, and it works fine, but I'm now in process of implementing proper support for key rotation and I have a few thoughts.
Let's say service A needs a key K for service B:
Is this the best approach or are there others to consider?
Then, in some situations I have a symmetric key J that is used within the same service, for example a key to encrypt some session with. So in one request to service C, a session is encrypted with key J1, then needs to be decrypted with J1 at a later stage. I have multiple instances of the C service.
The problem here is that if the same secret is used for both encryption and decryption, rotating it becomes more messy - if the key is rotated to have the value J2 and one instance has refreshed so that it will encrypt with J2, while another instance still doesn't see J2, the decryption will fail.
I can see a few approaches here:
Split into two secrets with separate rotation schemes and rotate one at a time, similar to the above. This adds overhead in terms of extra secrets to handle, with identical values (apart from them being rotated with some time in between)
Let the decryption force a refresh of the secret upon failure:
Use three keys in the key window and always encrypt with the middle one, since it should always be in the window of all other instances (unless it was rotated several times, faster than the refresh interval). This adds complexity.
What other options are there? This seems like such a standard use-case but I still struggled to find the best approach.
EDIT ------------------
Based on JoeB's answer, the algorithm I've come up with so far is this: Let's say that initially the secret has the CURRENT value K1, and PENDING value null.
Normal operation
AWSCURRENT
, AWSPENDING
and custom label ROTATING
and accept them all (if they exist) -> All services accept [AWSCURRENT
=K1]AWSCURRENT
=K1Key rotation
AWSCURRENT
=K1, AWSPENDING
=K2]ROTATING
to the K1 version + move AWSCURRENT
to the K2 version + remove AWSPENDING
label from K2 (there seems to be no atomic swapping of labels). Until T seconds have passed, some clients will use K2 and some K1, but all services accept bothAWSCURRENT
=K2, AWSPENDING
=K1] and all clients use AWSCURRENT
=K2ROTATING
stage from K1. Note that K1 will still have the AWSPREVIOUS
stage.AWSCURRENT
=K2], and K1 is effectively dead.This should work both for separate secrets and for symmetric secrets used for both encryption and decryption.
Unfortunately I don't know how to use the built-in rotation mechanism for this since it requires several steps with delays in between. One idea is to invent some custom steps and have the setSecret
step create a CloudWatch cron event that will invoke the function again after T seconds, calling it with steps swapPending
and removePending
. It would be awesome if SecretsManager could support this automatically, for example by supporting that the function returns a value indicating that the next step should be invoked after T seconds.
For your credential question, you do not have to keep both the current and previous credentials in the application as long as service B supports two active credentials. To do this you must ensure a credential is not marked AWSCURRENT until it is ready. Then the application just always fetches and uses the AWSCURRENT credential. To do this in the rotation lambda you would take the steps:
These are the same steps secrets manager takes when it creates a multi-user RDS rotation lambda. Be sure to use the AWSPENDING label because secrets manager treats that specially. If service B does not support two active credentials or multiple users sharing data, there might not be a way to do this. See the secrets manager rotation docs on this.
In addition, the Secrets Manager rotation engine is asynchronous and will retry after failures (which is why each Lambda step must be idempotent). There are an initial set of retries (on the order of 5) and then some daily retries thereafter. You can take advantage of this by failing the third step (testing the secret) via an exception until the propagation conditions are met. Alternatively, you can up the Lambda execution time to 15 minutes and sleep an appropriate amount of time waiting for propagation to complete. The sleep method, though, has the disadvantage of tying up resources needlessly.
Keep in mind as soon as you remove the pending stage or move AWSCURRENT to the pending stage, the rotation engine will stop. If application B accept current and pending (or current, pending, and previous if you want to be extra safe), the four steps above will work if you add the delay you described. You can also look at the AWS Secrets Manager Sample Lambdas for examples of how the stages are manipulated for database rotations.
For your encryption question, the best way I have seen to do this is to store an identifier of the encryption key with the encrypted data. So when you encrypt data D1 with key J1 you either store or otherwise pass to the downstream application something like the secret ARN and version (say V) to the application. If service A is sending encrypted data to service B in a message M(...) it would work as follows:
Note that the keys can be cached by both A and B. If the encrypted data is to be stored long term, you will have to ensure that a key is not deleted until either the encrypted data no long exists or it gets re-encrypted with the current key. You can also use multiple secrets (instead of versions) by passing different ARNs.
Another alternative is to use KMS for encryption. Service A would send the encrypted KMS datakey instead of the key identifier along with the encrypted payload. The encrypted KMS data key can be decrypted by B by calling KMS and then use the data key to decrypt the payload.