Search code examples
pythonsedgzipbotostringio

python gzip file in memory and upload to s3


I am using python 2.7...

I am trying to cat two log files , get data from specific dates using sed. Need to compress the files and upload them to s3 without making any temp files on the system,

sed_command = "sed -n '/{}/,/{}/p'".format(last_date, last_date)

Flow :

  1. cat two files .

Example : cat file1 file2

  1. Run sed manipulation in memory.
  2. compress the result in memory with zip or gzip.
  3. Upload the compressed file in memory to s3.

I have successfully done this with creation of temp files on the system and removing them when the upload to s3 is completed. I could not find a working solution to get this working on the fly without creation of any temp files.


Solution

  • Here's the gist of it:

    conn = boto.s3.connection.S3Connection(aws_key, secret_key)
    bucket = conn.get_bucket(bucket_name, validate=True)
    buffer = cStringIO.StringIO()
    writer = gzip.GzipFile(None, 'wb', 6, buffer)
    writer.write(sys.stdin.read())
    writer.close()
    buffer.seek(0)
    boto.s3.key.Key(bucket, key_path).set_contents_from_file(buffer)
    buffer.close()