I have a VM deployed via terraform with various tags set on it. Recently, a new "tag inheritance" Azure Policy has been assigned to the subscription containing that VM. The Policy basically says that all resources in the subscription should include all the tags from their parent resource group and subscription.
Once applied, this policy added all the missing tags to my VM. However, whenever I deploy my IaC template, it now highlights those additional tags as drift and attempts to remove them. Of course, the Azure Policy immediately then re-adds them, leading to a perpetual fight between Azure Policy and IaC.
I know I could easily update the IaC template to include the additional tags but, what happens when someone adds another tag to the parent resource group or subscription? And what happens when another Azure Policy is defined that updates my resources in some way? (E.g. Enabling Azure Defender can automatically add extensions to VMs)?
My question is whether there is any documentation that describes current best practices to prevent this situation from occurring in general? Ideally, I'm hoping for something reasonably "official" - ideally from one of the big cloud providers - that describes when it's acceptable/advisable to utilise such "auto-remediation" tooling.
Note: My personal opinion is that resources managed by IaC should be exempt from any automatic remediation. Instead, any non-compliance should be detected by "audit only" policies that raise alerts. Those alerts would, ultimately, result in new work items being added to the backlog of the team that owns the IaC template(s).
... however, I'd be perfectly fine with something that proves me wrong on that front :)
I know it's not the answer you're looking for, but unfortunately it's all about company governance.
The purpose of IAC is to make everything easy to replicate, avoid giving broad privileges to everyone, enabling infra review process before deployment and so much more.
All those tools could be deployed using terraform and you could therefore bind those resources together.
But everything could also be deployed manually with some auto-remediation tools in place. Or a mix or both.
I understand it can be frustrating when dealing with this situation, but I think there are no easy fix. Still some stuff you could do:
ignore_change