Search code examples
amazon-web-servicesterraformaws-cloudformationamazon-athenaterraform-provider-aws

Create AWS Athena view programmatically


Can you create views in Amazon Athena? outlines how to create a view using the User Interface.

I'd like to create an AWS Athena View programatically, ideally using Terraform (which calls CloudFormation).

I followed the steps outlined here: https://ujjwalbhardwaj.me/post/create-virtual-views-with-aws-glue-and-query-them-using-athena, however I run into an issue with this in that the view goes stale quickly.

...._view' is stale; it must be re-created.

The terraform code looks like this:

resource "aws_glue_catalog_table" "adobe_session_view" {

  database_name = "${var.database_name}"
  name = "session_view"

  table_type = "VIRTUAL_VIEW"
  view_original_text = "/* Presto View: ${base64encode(data.template_file.query_file.rendered)} */"
  view_expanded_text = "/* Presto View */"

  parameters = {
    presto_view = "true"
    comment = "Presto View"
  }

  storage_descriptor {
    ser_de_info {
      name = "ParquetHiveSerDe"
      serialization_library = "org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe"
    }

    columns { name = "first_column" type = "string" }
    columns { name = "second_column" type = "int" }
    ...
    columns { name = "nth_column" type = "string" }
}

An alternative I'd be happy to use is the AWS CLI, however aws athena [option] provides no option for this.

I've tried:

  • create-named-query which I have not been able to get working for a statement such as CREATE OR REPLACE VIEW as this doesn't seem to be the intended use case for this command.
  • start-query-execution which asks for an output location, which suggests that this is meant for querying the data and outputting the results, as opposed to making stateful changes/creations. It also seems to be paired with stop-query-execution.

Solution

  • As you suggested, it is definitely possible to create an Athena view programmatically via the AWS CLI using the start-query-execution. As you pointed out, this does require you to provide an S3 location for the results even though you won't need to check the file (Athena will put an empty txt file in the location for some reason).

    Here is an example:

    $ aws athena start-query-execution --query-string "create view my_view as select * from my_table" --result-configuration "OutputLocation=s3://my-bucket/tmp" --query-execution-context "Database=my_database"
    
    {
        "QueryExecutionId": "1744ed2b-e111-4a91-80ea-bcb1eb1c9c25"
    }
    

    You can avoid having the client specify a bucket by creating a workgroup and setting the location there.

    You can check whether your view creation was successful by using the get-query-execution command.

    $ aws --region athena get-query-execution --query-execution-id bedf3eba-55b0-42de-9a7f-7c0ba71c6d9b
    {
        "QueryExecution": {
            "QueryExecutionId": "1744ed2b-e111-4a91-80ea-bcb1eb1c9c25",
            "Query": "create view my_view as select * from my_table",
            "StatementType": "DDL",
            "ResultConfiguration": {
                "OutputLocation": "s3://my-bucket/tmp/1744ed2b-e111-4a91-80ea-bcb1eb1c9c25.txt"
            },
            "Status": {
                "State": "SUCCEEDED",
                "SubmissionDateTime": 1558744806.679,
                "CompletionDateTime": 1558744807.312
            },
            "Statistics": {
                "EngineExecutionTimeInMillis": 548,
                "DataScannedInBytes": 0
            },
            "WorkGroup": "primary"
        }
    }