Search code examples
amazon-web-servicesamazon-s3aws-cloudformationamazon-athena

Connecting Athena and S3 in same Cloudformation Stack


From the documentation, AWS::Athena::NamedQuery, it is unclear how to attach Athena to an S3 bucket specified in the same stack.

If I had to guess from the example, I would imagine that you can write a template like,

Resources:
  MyS3Bucket:
    Type: AWS::S3::Bucket
       ... other params ...

  AthenaNamedQuery:
    Type: AWS::Athena::NamedQuery
    Properties:
      Database: "db_name"
      Name: "MostExpensiveWorkflow"
      QueryString: >
                    CREATE EXTERNAL TABLE db_name.test_table 
                    (...) LOCATION s3://.../path/to/folder/

Would a template like the above work? Upon stack creation, will the table db_name.test_table be available to run queries on?


Solution

  • Turns out the way you connect the S3 and Athena is to make a Glue table! How silly of me!! Of course Glue is how you connect things!

    Sarcasm aside, this is a template that worked for me when using AWS::Glue::Table and AWS::Glue::Database,

    Resources:
      MyS3Bucket:
        Type: AWS::S3::Bucket
    
      MyGlueDatabase:
        Type: AWS::Glue::Database
        Properties:
          DatabaseInput:
            Name: my-glue-database
            Description: "Glue beats tape"
          CatalogId: !Ref AWS::AccountId
    
      MyGlueTable:
        Type: AWS::Glue::Table
        Properties:
          DatabaseName: !Ref MyGlueDatabase
          CatalogId: !Ref AWS::AccountId
          TableInput:
            Name: my-glue-table
            Parameters: { "classification" : "csv" }
            StorageDescriptor:
              Location:
                Fn::Sub: "s3://${MyS3Bucket}/"
              InputFormat: "org.apache.hadoop.mapred.TextInputFormat"
              OutputFormat: "org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat"
              SerdeInfo:
                Parameters: { "separatorChar" : "," }
                SerializationLibrary: "org.apache.hadoop.hive.serde2.OpenCSVSerde"
              StoredAsSubDirectories: false
              Columns:
                - Name: column0
                  Type: string
                - Name: column1
                  Type: string
    

    After this, the database and table were in the AWS Athena Console!