There is a scenario where I need to verify the checksum(md5) of a file stored in s3 bucket. This can be achieved when uploading the file by specifying the checksum value in the metadata of api call. But in my case, I wanted to verify the checksum after put the data into bucket programmatically. Every object in S3 will have attribute called 'ETag' which is the md5 checksum calculated by S3.
Is there anyway to get the ETag of a specific object and compare the checksum of both local file & file stored in s3 using boto3 client in a python script?
Boto3 api has provided a way to get the metadata of an object stored in s3. The following snippet will help to get the metadata via programmatically :
>>> s3_cli = boto3.client('s3')
>>> s3_resp = s3_cli.head_object(Bucket='ventests3', Key='config/ctl.json')
>>> print pprint.pprint(s3_resp)
>>> pp.pprint(s3_resp)
{u'AcceptRanges': 'bytes',
u'ContentLength': 4325,
u'ContentType': 'binary/octet-stream',
u'ETag': '"040c003386f1e2001816d32f2125d07a"',
u'LastModified': datetime.datetime(2018, 9, 20, 7, 15, 3, tzinfo=tzutc()),
u'Metadata': {},
'ResponseMetadata': {'HTTPHeaders': {'accept-ranges': 'bytes',
'content-length': '4325',
'content-type': 'binary/octet-stream',
'date': 'Thu, 20 Sep 2018 07:20:53 GMT',
'etag': '"040c003386f1e2001816d32f2125d07a"',
'last-modified': 'Thu, 20 Sep 2018 07:15:03 GMT',
'server': 'AmazonS3',
'x-amz-id-2': 'P2wapOciWCKPfol2sBgoo11tRdr4KwKcDJ/nHW7LZn00mvKfMYyfAPPV2tIcf3Vu+lrV57NBARY=',
'x-amz-request-id': '42AF970E7C9AA18C'},
'HTTPStatusCode': 200,
'HostId': 'P2wapOciWCKPfol2sBgoo11tRdr4KwKcDJ/nHW7LZn00mvKfMYyfAPPV2tIcf3Vu+lrV57NBARY=',
'RequestId': '42AF970E7C9AA18C',
'RetryAttempts': 0}}
>>> s3obj_etag = s3_resp['ETag'].strip('"')
>>> print s3obj_etag
'040c003386f1e2001816d32f2125d07a'
The head_object() method in s3 client object will fetch the metadata (headers) of a given object stored in the s3 bucket.