Search code examples
amazon-web-servicesterraformaws-glueamazon-emrtrino

How to enable "Use for Hive table metadata" in "AWS Glue Data Catalog settings" using Terraform?


I am using Terraform to set up Trino cluster managed by Amazon EMR.

Here is my Terraform code:

resource "aws_emr_cluster" "hm_amazon_emr_cluster" {
  name                              = "hm-trino"
  release_label                     = "emr-7.1.0"
  applications                      = ["HCatalog", "Trino"]
  master_instance_fleet {
    name                      = "Primary"
    target_on_demand_capacity = 3
    launch_specifications {
      on_demand_specification {
        allocation_strategy = "lowest-price"
      }
    }
    instance_type_configs {
      weighted_capacity = 1
      instance_type     = "r7g.xlarge"
    }
  }
  configurations_json = <<EOF
    [
      {
        "Classification": "trino-connector-hive",
        "Properties": {
          "hive.metastore": "glue"
        }
      }
    ]
  EOF
  # ...
}

To enable High availability (HA) for this Trino cluster, besides

  • Add HCatalog in applications.
  • Change master_instance_fleet.target_on_demand_capacity = 3.
  • Add trino-connector-hive to use glue in configurations_json.

I need enable "Use for Hive table metadata" in "AWS Glue Data Catalog settings" like this UI:

enter image description here

However, I didn't find any info about setting this config at https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/emr_cluster

Any ideas?


Solution

  • I found I can add hive-site with "hive.metastore.client.factory.class": "com.amazonaws.glue.catalog.metastore.AWSGlueDataCatalogHiveClientFactory" in configurations_json to reflect the "Use for Hive table metadata" in "AWS Glue Data Catalog settings" in the UI.

    Here is the final code:

    resource "aws_emr_cluster" "hm_amazon_emr_cluster" {
      name                              = "hm-trino"
      release_label                     = "emr-7.1.0"
      applications                      = ["HCatalog", "Trino"]
      master_instance_fleet {
        name                      = "Primary"
        target_on_demand_capacity = 3
        launch_specifications {
          on_demand_specification {
            allocation_strategy = "lowest-price"
          }
        }
        instance_type_configs {
          weighted_capacity = 1
          instance_type     = "r7g.xlarge"
        }
      }
      configurations_json = <<EOF
        [
          {
            "Classification": "hive-site",
            "Properties": {
              "hive.metastore.client.factory.class": "com.amazonaws.glue.catalog.metastore.AWSGlueDataCatalogHiveClientFactory"
            }
          },
          {
            "Classification": "trino-connector-hive",
            "Properties": {
              "hive.metastore": "glue"
            }
          }
        ]
      EOF
      placement_group_config = [
        {
          instance_role      = "MASTER"
          placement_strategy = "SPREAD"
        }
      ]
      # ...
    }
    

    References: