Search code examples
terraformterraform-provider-awsterraform-provider-gcp

Shall we include ForceNew or throw an error during "terraform apply" for a new TF resource?


Context: Implementation a Terraform Provider via TF Provider SDKv2 by following an official tutorial.

As a result, all of the schema elements in the corresponding Terraform Provider resource aws_launch_configuration are marked as ForceNew: true. This behavior instructs Terraform to first destroy and then recreate the resource if any of the attributes change in the configuration, as opposed to trying to update the existing resource.

TF tutorial suggests we should add ForceNew: true for every non-updatable field like:

"base_image": {
  Type:     schema.TypeString,
  Required: true,
  ForceNew: true,
},

resource "example_instance" "ex" {
  name = "bastion host"
  base_image = "ubuntu_17.10" # base_image updates are not supported
}

However one might run into the following:

Let's consider "important" resources foo_db_instance (a DB instance that should be deleted / recreated in exceptional scenarios) (related unanswered question) that has name attribute:

resource "foo_db_instance" "ex" {
  name = "bar" # name updates are not supported
  ...
}

However its underlying API was written in a weird way and it doesn't support updates for name attribute. There're 2 options:

  1. Following approach of the tutorial, we might add ForceNew: true and then, if a user doesn't pay attention to terraform plan output it might recreate foo_db_instance.ex when updating name attribute by accident that will create an outage.

  2. Don't follow the approach from the tutorial and don't add ForceNew: true. As a result terraform plan will not output the error and it will make it look like the update is possible. However when running terraform apply a user will run into an error, if we add a custom code to resourceUpdate() like this:

func resourceUpdate(ctx context.Context, d *schema.ResourceData, meta interface{}) diag.Diagnostics {
    if d.HasChanges("name) {
        return diag.Errorf("error updating foo_db_instance: name attribute updates are not supported")
    }
  ...
}

There're 2 disadvantages of this approach:

  • non-failing output of terraform plan
  • we might need some hack to restore tf state to override d.Set(name, oldValue).

Which approach should be preferrable?

I know there's prevent_destroy = true lifecycle attribute but it seems like it won't prevent this specific scenario (it only prevents from accidental terraform destroy).


Solution

  • The most typical answer is to follow your first option, and then allow Terraform to report in its UI that the change requires replacement and allow the user to decide how to proceed.

    It is true that if someone does not read the plan output then they can potentially make a change they did not intend to make, but in that case the user is not making use of the specific mechanism that Terraform provides to help users avoid making undesirable changes.

    You mentioned prevent_destroy = true and indeed that this a setting that's relevant to this situation, and is in fact exactly what that option is for: it will cause Terraform to raise an error if the plan includes a "replace" action for the resource that was annotated with that setting, thereby preventing the user from accepting the plan and thus from destroying the object.

    Some users also wrap Terraform in automation which will perform more complicated custom policy checks on the generated plan, either achieving a similar effect as prevent_destroy (blocking the operation altogether) or alternatively just requiring an additional confirmation to help ensure that the operator is aware that something unusual is happening. For example, in Terraform Cloud a programmatic policy can report a "soft failure" which causes an additional confirmation step that might be approvable only by a smaller subset of operators who are better equipped to understand the impact of what's being proposed.


    It is in principle possible to write logic in either the CustomizeDiff function (which runs during planning) or the Update function (which runs during the apply step) to return an error in this or any other situation you can write logic for in the Go programming language. Of these two options I would say that CustomizeDiff would be preferable since that would then prevent creating a plan at all, rather than allowing the creation of a plan and then failing partway through the apply step, when some other upstream changes may have already been applied.

    However, to do either of these would be inconsistent with the usual behavior users expect for Terraform providers. The intended model is for a Terraform provider to describe the effect of a change as accurately as possible and then allow the operator to make the final decision about whether the proposed change is acceptable, and to cancel the plan and choose another strategy if not.