So I have very large csv file in my s3 database (2 mil+ lines) and I want to import it to dynamodb.
What I tried:
Lambda I manage to get the lambda function to work, but only around 120k lines were imported to ddb after my function being timed out.
Pipeline When using pipeline it got stuck on "waiting for runner" followed by it stopping completely
Here's a serverless approach to process the large .csv
in small chunks with 2 Lambdas and a SQS Queue:
SELECT s.primary_key FROM S3Object s
, querying the .csv
in place. See the SelectObjectContent API for details..csv
using S3 Select: SELECT * WHERE s.primary_key IN ('id1', 'id2', 'id3') FROM S3Object s