Search code examples
amazon-dynamodbaws-glue

AWS Glue Dynamo Extract


I exported DYnamoDB table to S3, and used this guide to import into AThena. Each column is an entire dynam entry.

I get:

Item
{key={s=value1}, key2={n=value2}}
{key={s=value3}, key2={n=value4}}
{key={s=value5}, key2={n=value6}}

When I expected

key     key2
value1  value2
value3  value4
value5  value6

I'm assuming it's something to do with the json path in a custom classifier, but don't know where to go next.


Solution

  • Instead of relying on the crawler you can create your own external table which will allow you to define the schema manually:

    CREATE EXTERNAL TABLE IF NOT EXISTS ddb_exported_table (
      Item struct <id:struct<S:string>,
                   name:struct<S:string>,
                   coins:struct<N:string>>
    )
    ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
    LOCATION 's3://my-dynamodb-export-bucket/AWSDynamoDB/{EXPORT_ID}/data/'
    TBLPROPERTIES ( 'has_encrypted_data'='true');
    

    You can also use a CTAS query to create a table based on your results:

    https://docs.aws.amazon.com/athena/latest/ug/ctas-insert-into-etl.html

    Or perhaps do something as simple as this:

    SELECT
        Item.id.S as id,
        Item.name.S as name,
        Item.coins.N as coins
    FROM ddb_exported_table
    ORDER BY cast(coins as integer) DESC