Search code examples
amazon-web-servicesamazon-athenaamazon-quicksight

AWS Quicksight cant see Athena DB in another region


My Athena DB is in ap-south-1 region and AWS QuickSight doesn't exist in that region.

How can I connect QuickSight with Athena in that case?


Solution

  • All you need to do is to copy table definitions from one region to another. There are several ways to do that

    With AWS Console

    This approach is the most simple one and doesn't require additional setup as everything is based on Athena DDL statements.

    1. Get table definition with
      SHOW CREATE TABLE `database`.`table`;
      
      This should output something like:
      CREATE EXTERNAL TABLE `database`.`table`(
        `col_1` string, 
        `col_2` bigint, 
        ... 
        `col_n` string)
      ROW FORMAT SERDE 
        'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' 
      STORED AS INPUTFORMAT 
        'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat' 
      OUTPUTFORMAT 
        'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
      LOCATION
        's3://some/location/on/s3'
      TBLPROPERTIES (
        'classification'='parquet',
        ... 
        'compressionType'='gzip')
      
    2. Change to a desired region
    3. Create database where you want to store table definitions, or use default one.
    4. Execute statement produced by SHOW CREATE TABLE. Note, you might need to change name of database with respect to previous step
    5. If you table is partitioned then you would need to load all partitions. If data on S3 adheres HIVE partitioning style, i.e.

      s3://some/location/on/s3
      |
      ├── day=01
      |   ├── hour=00
      |   └── hour=01
      ...
      

      then you can use

      MSCK REPAIR TABLE `database`.`table`
      

      Alternatively, you can load partitions one by one

      ALTER TABLE `database`.`table` 
      ADD PARTITION (day='01', hour='00') 
      LOCATION 's3://some/location/on/s3/01/00';
      
      ALTER TABLE `database`.`table` 
      ADD PARTITION (day='01', hour='01') 
      LOCATION 's3://some/location/on/s3/01/01';
      
      ...
      

    With AWS API

    You can use AWS SDK, e.g. boto3 for python, which provide an easy to use, object-oriented API. Here you have two options:

    1. Use Athena client. Like in a previous approach, you would need to get table definition statement from AWS Console. But all other steps, can be done in scripted manner with the use of start_query_execution method of Athena Client. There are plenty resources online, e.g. this one

    2. Use AWS Glue client. This method is solely based on operation within AWS Glue Data Catalog, which is used by Athena during query execution. Main idea is to create two glue clients, one for source and one for destination catalog. For example

      import boto3
      KEY_ID = "__KEY_ID__"
      SECRET = "__SECRET__"
      
      glue_source = boto3.client(
          'glue',
          region_name="ap-south-1",
          aws_access_key_id=KEY_ID,
          aws_secret_access_key=SECRET
      )
      
      glue_destination = boto3.client(
          'glue',
          region_name="us-east-1",
          aws_access_key_id=KEY_ID,
          aws_secret_access_key=SECRET
      )
      
      # Or you can do it with creating sessions
      glue_source = boto3.session.Session(profile_name="profile_for_ap_south_1").client("glue")
      glue_destination = boto3.session.Session(profile_name="profile_for_us_east_1").client("glue")
      

      Then you would need to use get and create type methods. This would also require parsing responses that would get from glue clients.

    With AWS Glue crawlers

    Although, you can use AWS Glue crawlers to "rediscover" data on S3, I wouldn't recommend this approach since you already know structure of you data.