amazon-web-services aws-lambda aws-codepipeline aws-codebuild aws-codecommit

Copy CSV files from a public Git subdirectory to an S3 Bucket

I see that there are multiple methods to do so, but I was not able to do it using AWS Lambda (I may be missing something there) Any recommendations on the method and preferable a link related to the implementation steps would be useful. The public Git link is huge, however, I need the csv files from the subdirectory only.

Solution

Any git repository provides you a raw link - for example, https://github.com/thephpleague/csv/raw/master/tests/data/foo.csv You can use your favorite http client in your favorite runtime, to pull this file down.

If you feel the file is too huge to fit in 512MB, you can mount an EFS (https://aws.amazon.com/blogs/compute/using-amazon-efs-for-aws-lambda-in-your-serverless-applications/).

And if it is so large that you cannot download it within 15 minutes, you can try to download in parts - across multiple lambda invocations. You can save the resume status on EFS. In fact, you can also store the resume info in the /tmp folder in the lambda. You will get it back if the second lambda invocation is quick enough.

Hope that answers your question.