Search code examples
pythonamazon-glacierboto3

Accessing stream in job.get_output('body')


Sample code

import boto3

glacier = boto3.resource('glacier')
job = glacier.Job(accountID, vaultlist[0], id=joblist[0])

r = job.get_output()
print(r0['body'])

That print only yields botocore.response.StreamingBody at 0xsnip

r0['body'] should be the inventory in CSV format, but I can't figure out how to get to it. I spent a bit of time trying to us io to read in the steam and either that is not the right way or I did it wrong. Can you point me in the right direction?

Thanks!


Solution

  • OK I couldn't get the other way to work at all, mostly my own lack of skills I'm sure. But I was able to use the HTTP GET to download the inventory into a file. This is how I did that. You will see lots of I had two vaults, one job each, you could modify this and loop in other ways or just use [0] for both lists if you have one vault and one job, but the important part is the sample from Amazon EC2 that I modified to retrieve the Inventory from a completed Glacier Job.

    I know my code it not very well written, but it worked for my one-shot need. Hope this is helpful to others.

    import requests, sys, os, hashlib, hmac, json
    from datetime import datetime
    
    # ************* REQUEST VALUES *************
    method = 'GET'
    service = 'glacier'
    region = '<YOUR_REGION'
    host = 'glacier.' + region + '.amazonaws.com'
    endpoint = 'https://glacier.' + region + '.amazonaws.com'
    request_parameters = ''
    accountid = '<YOUR_ACCOUNT_ID'
    vaultlist = ["VAULT_ONE", "VAULT_TWO"]
    joblist = ['JOB_ID_ONE',
               'JOB_ID_TWO']
    rangelist = ['JOB_SIZE_ONE',
                 'JOB_SIZE_TWO',]
    url0 = "/" + accountid + "/vaults/" + vaultlist[0] + "/jobs/" + joblist[0] + "/output"
    url1 = "/" + accountid + "/vaults/" + vaultlist[1] + "/jobs/" + joblist[1] + "/output"
    filename =['archive0.json', 'archive1.json'] #filenames
    # Key derivation functions. See:
    # http://docs.aws.amazon.com/general/latest/gr/signature-v4-examples.html#signature-v4-examples-python
    def sign(key, msg):
        return hmac.new(key, msg.encode('utf-8'), hashlib.sha256).digest()
    
    def getSignatureKey(key, dateStamp, regionName, serviceName):
        kDate = sign(('AWS4' + key).encode('utf-8'), dateStamp)
        kRegion = sign(kDate, regionName)
        kService = sign(kRegion, serviceName)
        kSigning = sign(kService, 'aws4_request')
        return kSigning
    
    # Read AWS access key from env. variables or configuration file. Best practice is NOT
    # to embed credentials in code.
    access_key = os.environ.get('AWS_ACCESS_KEY')
    secret_key = os.environ.get('AWS_SECRET_KEY')
    if access_key is None or secret_key is None:
        print('No access key is available via your environment variables.')
        sys.exit()
    
    # Create a date for headers and the credential string
    t = datetime.utcnow()
    amzdate = t.strftime('%Y%m%dT%H%M%SZ')
    datestamp = t.strftime('%Y%m%d') # Date w/o time, used in credential scope
    
    # ************* TASK 1: CREATE A CANONICAL REQUEST *************
    # http://docs.aws.amazon.com/general/latest/gr/sigv4-create-canonical-request.html
    
    # Step 1 is to define the verb (GET, POST, etc.)--already done.
    
    # Step 2: Create canonical URI--the part of the URI from domain to query
    # string (use '/' if no path)
    canonical_uri = url1
    
    # Step 3: Create the canonical query string. In this example (a GET request),
    # request parameters are in the query string. Query string values must
    # be URL-encoded (space=%20). The parameters must be sorted by name.
    # For this example, the query string is pre-formatted in the request_parameters variable.
    canonical_querystring = request_parameters
    
    # Step 4: Create the canonical headers and signed headers. Header names
    # and value must be trimmed and lowercase, and sorted in ASCII order.
    # Note that there is a trailing \n.
    canonical_headers = 'host:' + host + '\n' + 'x-amz-date:' + amzdate + '\n'
    
    # Step 5: Create the list of signed headers. This lists the headers
    # in the canonical_headers list, delimited with ";" and in alpha order.
    # Note: The request can include any headers; canonical_headers and
    # signed_headers lists those that you want to be included in the
    # hash of the request. "Host" and "x-amz-date" are always required.
    signed_headers = 'host;x-amz-date'
    
    # Step 6: Create payload hash (hash of the request body content). For GET
    # requests, the payload is an empty string ("").
    payload_hash = hashlib.sha256("".encode()).hexdigest()
    
    # Step 7: Combine elements to create create canonical request
    canonical_request = method + '\n' + canonical_uri + '\n' + canonical_querystring + '\n' + canonical_headers +\
                        '\n' + signed_headers + '\n' + payload_hash
    
    # ************* TASK 2: CREATE THE STRING TO SIGN*************
    # Match the algorithm to the hashing algorithm you use, either SHA-1 or
    # SHA-256 (recommended)
    algorithm = 'AWS4-HMAC-SHA256'
    credential_scope = datestamp + '/' + region + '/' + service + '/' + 'aws4_request'
    string_to_sign = algorithm + '\n' +  amzdate + '\n' +  credential_scope + '\n' + \
                     hashlib.sha256(canonical_request.encode()).hexdigest()
    
    
    # ************* TASK 3: CALCULATE THE SIGNATURE *************
    # Create the signing key using the function defined above.
    signing_key = getSignatureKey(secret_key, datestamp, region, service)
    
    # Sign the string_to_sign using the signing_key
    signature = hmac.new(signing_key, string_to_sign.encode('utf-8'), hashlib.sha256).hexdigest()
    
    
    # ************* TASK 4: ADD SIGNING INFORMATION TO THE REQUEST *************
    # The signing information can be either in a query string value or in
    # a header named Authorization. This code shows how to use a header.
    # Create authorization header and add to request headers
    authorization_header = algorithm + ' ' + 'Credential=' + access_key + '/' + credential_scope + ', ' +\
                           'SignedHeaders=' + signed_headers + ', ' + 'Signature=' + signature
    
    # The request can include any headers, but MUST include "host", "x-amz-date",
    # and (for this scenario) "Authorization". "host" and "x-amz-date" must
    # be included in the canonical_headers and signed_headers, as noted
    # earlier. Order here is not significant.
    # Python note: The 'host' header is added automatically by the Python 'requests' library.
    # headers = {'x-amz-date':amzdate, 'Authorization':authorization_header}
    
    
    headers0 = {'x-amz-date': amzdate,
                'Authorization': authorization_header,
                'x-amz-glacier-version': '2012-06-01',
                'Range': '0 - ' + rangelist[0],
                }
    headers1 = {'x-amz-date': amzdate,
               'Authorization': authorization_header,
                'x-amz-glacier-version': '2012-06-01',
               'Range': rangelist[1],
                }
    headers = headers1
    
    # ************* SEND THE REQUEST *************
    request_url = endpoint + url1
    print(url0)
    print('\nBEGIN REQUEST++++++++++++++++++++++++++++++++++++')
    print('Request URL: ' + request_url + '\n')
    print('Headers: ' + json.dumps(headers))
    print('Auth : ' + authorization_header + '\n' )
    r = requests.get(request_url, headers=headers, stream = True)
    
    print('\nRESPONSE++++++++++++++++++++++++++++++++++++')
    print('Response code: %d\n' % r.status_code)
    # print(r.text) #This is in the original Sample and useful for debugging. But not if your inventory is large.
    
    
    # *********** Write it to file ***********
    f = open(filename[1], mode='w')
    f.write(r.text)
    f.close()