Search code examples
amazon-web-servicesamazon-s3amazon-dynamodbaws-cli

How can I import DynamoDB table JSON exported to S3?


I have exported a DynamoDB table using Export to S3 in the AWS console. The format is DynamoDB JSON & the file contains 250 items.

I want to import the data into another table.

Since there is no import functionality in the AWS console, I wanted to use the AWS CLI but it seems that this requires another format & is limited to batches of 25 items.

Is there a way to achieve this simply within the AWS CLI?

What is the best way to import the data into another table?

I assume since the AWS console allows you to perform an export, there must be some simple way to import this data.

N.B. Since AWS Data Pipeline is not supported in my region, I can't use it.


Solution

  • Update as of 18th August 2022:

    AWS have now introduced a method to import DynamoDB JSON (amongst other formats) into DynamoDB.

    Check out the official announcement: Amazon DynamoDB now supports bulk imports from Amazon S3 to new DynamoDB tables

    Official blog post: Amazon DynamoDB can now import Amazon S3 data into a new table


    Old answer for reference.

    TLDR: You have to unmarshall & upload the JSON yourself.


    Since there is no import functionality in the AWS console, I wanted to use the AWS CLI but it seems that this requires another format & is limited to batches of 25 items.

    Correct, the AWS CLI allows you to use batch-write-item to load data into a table - this is where the 25 batch PUT/DELETE request limit comes from - however that is for unmarshalled ('regular') JSON.

    The output of Export to Amazon S3 is DynamoDB's marshalled JSON format, which isn't compatible with the batch-write-item command.

    Is there a way to achieve this simply within the AWS CLI?

    Unfortunately, DynamoDB's own Export to Amazon S3 flow does not have an equivalent Import from Amazon S3 flow neither inside the console nor the CLI.

    Since the AWS Command Line Interface is just an interface to AWS's SDK for Python (Boto3), it also means that the SDK doesn't also support an import of marshalled JSON which ultimately means that the underlying API does not support this.

    What is the best way to import the data into another table?

    The solution is to create a quick prototype to take the uncompressed JSON files, unmarshal the JSON using the suitable SDK method (e.g. unmarshall method in the Javascript SDK) and then upload the unmarshalled items to the table.

    You can either use the CLI or the DynamoDB SDK for the language your prototype application is in for uploading.

    The AWS CLI does not support unmarshalling of JSON so that is why you need your own prototype application.


    I encountered this same issue a while back.

    I hope that AWS eventually, at a minimum, supports (un/)marshalling of JSON via the CLI - it looks doable via Boto3 so I'm not sure why it's not surfaced in the SDK.

    However ultimately, this is a gap in AWS's current offering and there should be a simple Import from S3 API endpoint, with supporting SDK implementations, CLI functionality and a console interface.

    This would also remove the associated costs of doing the upload manually as hopefully, since the export feature does not consume read capacity, you would hope that the import feature would not consume write capacity.


    I might write a small open-source cross-platform console app for unmarshalling & doing the batch upload...