Search code examples
amazon-web-servicesterraformamazon-rdsterraform-provider-aws

InvalidDBClusterStateFault: Source cluster is in a state which is not valid for physical replication when adding a new rds cluster in global cluster


I am using Terraform to setup RDS Global Cluster in 2 regions - us-east-1 and us-east-2. Engine is "aurora-postgres" and engine_version is "13.4".

I already had an existing cluster in us-east-1 made without Terraform, which I imported into terraform, and now want to create a global cluster with another cluster in us-east-2. So I am following this part of the aws-provider docs

Here is what my current hcl looks like:

# provider.tf

terraform {
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 4.0"
    }
  }
}

provider "aws" {
  alias  = "useast1"
  region = "us-east-1"

  assume_role {
    role_arn = var.TF_IAM_ROLE_ARN
  }
}

provider "aws" {
  alias  = "useast2"
  region = "us-east-2"

  assume_role {
    role_arn = var.TF_IAM_ROLE_ARN
  }
}


# rds.tf

locals {
  rds-monitoring-role_arn = "iam role for rds monitoring"
  kms_key_id = {
    "us-east-1" : "aws managed rds key arn in us-east-1"
    "us-east-2" : "aws managed rds key arn in us-east-2"
  }
}


resource "aws_rds_global_cluster" "global-lego-production" {
  global_cluster_identifier    = "global-lego-production"
  force_destroy                = true
  source_db_cluster_identifier = aws_rds_cluster.lego-production-us-east-1.arn

  lifecycle {
    ignore_changes = [
      engine_version,
      database_name
    ]
  }
}

resource "aws_rds_cluster" "lego-production-us-east-1" {
  provider                        = aws.useast1
  engine                          = "aurora-postgresql"
  engine_version                  = "13.4"
  cluster_identifier              = "lego-production"
  master_username                 = "nektar"
  master_password                 = var.RDS_MASTER_PASSWORD
  database_name                   = "lego"
  db_subnet_group_name            = module.us-east-1.rds-lego-prod-subnet-group-id
  db_cluster_parameter_group_name = module.us-east-1.rds-lego-production-parameter-group-id
  backup_retention_period         = 7

  storage_encrypted = true
  kms_key_id        = local.kms_key_id.us-east-1

  copy_tags_to_snapshot               = true
  deletion_protection                 = true
  skip_final_snapshot                 = true
  iam_database_authentication_enabled = true
  enabled_cloudwatch_logs_exports     = ["postgresql"]

  vpc_security_group_ids = [
    module.us-east-1.rds-db-webserver-security-group-id,
    module.us-east-1.rds-db-quicksight-security-group-id
  ]

  tags = {
    vpc = "nektar"
  }

  lifecycle {
    ignore_changes = [
      engine_version,
      global_cluster_identifier
    ]
  }
}

resource "aws_rds_cluster_instance" "lego-production-us-east-1-instance-1" {
  provider             = aws.useast1
  engine               = aws_rds_cluster.lego-production-us-east-1.engine
  engine_version       = aws_rds_cluster.lego-production-us-east-1.engine_version
  identifier           = "lego-production-instance-1"
  cluster_identifier   = aws_rds_cluster.lego-production-us-east-1.id
  instance_class       = "db.r6g.4xlarge"
  db_subnet_group_name = module.us-east-1.rds-lego-prod-subnet-group-id

  monitoring_role_arn                   = local.rds-monitoring-role_arn
  performance_insights_enabled          = true
  performance_insights_kms_key_id       = local.kms_key_id.us-east-1
  performance_insights_retention_period = 7
  monitoring_interval                   = 60

  tags = {
    "devops-guru-default" = "lego-production"
  }

  lifecycle {
    ignore_changes = [
      instance_class
    ]
  }
}

resource "aws_rds_cluster_instance" "lego-production-us-east-1-instance-2" {
  provider             = aws.useast1
  engine               = aws_rds_cluster.lego-production-us-east-1.engine
  engine_version       = aws_rds_cluster.lego-production-us-east-1.engine_version
  identifier           = "lego-production-instance-1-us-east-1b"
  cluster_identifier   = aws_rds_cluster.lego-production-us-east-1.id
  instance_class       = "db.r6g.4xlarge"
  db_subnet_group_name = module.us-east-1.rds-lego-prod-subnet-group-id

  monitoring_role_arn                   = local.rds-monitoring-role_arn
  performance_insights_enabled          = true
  performance_insights_kms_key_id       = local.kms_key_id.us-east-1
  performance_insights_retention_period = 7
  monitoring_interval                   = 60

  tags = {
    "devops-guru-default" = "lego-production"
  }

  lifecycle {
    ignore_changes = [
      instance_class
    ]
  }
}

resource "aws_rds_cluster" "lego-production-us-east-2" {
  provider                        = aws.useast2
  engine                          = aws_rds_cluster.lego-production-us-east-1.engine
  engine_version                  = aws_rds_cluster.lego-production-us-east-1.engine_version
  cluster_identifier              = "lego-production-us-east-2"
  global_cluster_identifier       = aws_rds_global_cluster.global-lego-production.id
  db_subnet_group_name            = module.us-east-2.rds-lego-prod-subnet-group-id
  db_cluster_parameter_group_name = module.us-east-2.rds-lego-production-parameter-group-id
  backup_retention_period         = 7

  storage_encrypted = true
  kms_key_id        = local.kms_key_id.us-east-2

  copy_tags_to_snapshot               = true
  deletion_protection                 = true
  skip_final_snapshot                 = true
  iam_database_authentication_enabled = true
  enabled_cloudwatch_logs_exports     = ["postgresql"]

  vpc_security_group_ids = [
    module.us-east-2.rds-db-webserver-security-group-id,
    module.us-east-2.rds-db-quicksight-security-group-id
  ]


  tags = {
    vpc = "nektar"
  }

  depends_on = [
    aws_rds_cluster.lego-production-us-east-1,
    aws_rds_cluster_instance.lego-production-us-east-1-instance-1,
    aws_rds_cluster_instance.lego-production-us-east-1-instance-2
  ]

  lifecycle {
    ignore_changes = [
      engine_version
    ]
  }
}

resource "aws_rds_cluster_instance" "lego-production-us-east-2-instance-1" {
  provider             = aws.useast2
  engine               = aws_rds_cluster.lego-production-us-east-1.engine
  engine_version       = aws_rds_cluster.lego-production-us-east-1.engine_version
  identifier           = "lego-production-instance-1"
  cluster_identifier   = aws_rds_cluster.lego-production-us-east-2.id
  instance_class       = "db.r6g.4xlarge"
  db_subnet_group_name = module.us-east-2.rds-lego-prod-subnet-group-id

  monitoring_role_arn                   = local.rds-monitoring-role_arn
  performance_insights_enabled          = true
  performance_insights_kms_key_id       = local.kms_key_id.us-east-2
  performance_insights_retention_period = 7
  monitoring_interval                   = 60

  tags = {
    "devops-guru-default" = "lego-production"
  }

  lifecycle {
    ignore_changes = [
      instance_class
    ]
  }
}

When applying it with terraform plan -out tfplan.out and then terraform apply tfplan.out (the initial plan only showed adding the 3 resources - aws_rds_global_cluster, aws_rds_cluster & aws_rds_cluster_instance in us-east-2)...

The Global Cluster was created successfully (as seen in the AWS Console). But the RDS Cluster in us-east-2 is failing due to the error InvalidDBClusterStateFault: Source cluster: arn:aws:rds:us-east-1:<account-id>:cluster:lego-production is in a state which is not valid for physical replication.

I tried the same thing using just the AWS Console (without terraform, "Add Region" through the "Modify" option on selecting the Global Cluster), and it shows the same error.

What criteria is missing for adding another region to my global cluster? It certainly isn't just terraform acting up. And I couldn't find any other places on the internet where somebody encountered the same error.

If there is any other information that I should provide, pls comment.


Solution

  • It took me the AWS Developer Support Plan to resolve this.

    The reason for the error InvalidDBClusterStateFault is pretty straighforward apparently - there are some pending changes to the cluster, to be applied at the next maintenance window.

    That's it! To view the pending changes you can run the following command:

    aws rds describe-db-clusters --db-cluster-identifier lego-production --query 'DBClusters[].{DBClusterIdentifier:DBClusterIdentifier,PendingModifiedValues:PendingModifiedValues}'

    In my case, some changes made through terraform were gonna be applied at the next maintenance window. I had to add the following line in my aws_rds_cluster resource block to apply the aforementioned changes - immediately:

    resource "aws_rds_cluster" "lego-production-us-east-1" {
       ...
    +  apply_immediately = true
       ...
    }
    

    And the same had to be done for resource block lego-production-us-east-2 also, just to be sure.

    Once I applied these changes, the cluster addition to the global cluster took place as expected.