Search code examples
amazon-web-servicesamazon-ec2amazon-s3clouds3cmd

Big data zip on amazon S3 files


I have large amount of data stored on amazon S3 in the forms of objects.

like i Have user which have 200+ GB of photos (about 100000+ objects) stored on amazon S3. each object is a photo , each object size is average 5MB.

Now I want to give a user a link to download data.

Currently what i am doing.

  1. Using S3cmd i copy all the objects from S3 to EC2.
  2. and then using ZIP command or TAR Command i create a ZIp.
  3. After Zip process is complete i move the zip file back to the S3.
  4. and Then create a singed link that i send to user as an email.

But this process takes a long long time, most of the time it gives out of memory issues, storage issues and this process is very slow.

I need to Know

  1. Is there any way that i can boost this process time.
  2. Is there any third party service/tool where i can create fast zip of my files and send to user.
  3. or any other 3rd party solution, I am ready to pay for it.

Solution

  • Try using EMR (Elastic Map Reducer and the S3distCp) that can be helpful in your required situation, for EMR you have to create a cluster. and the running your job.