Search code examples
python-3.xgoogle-cloud-platformpython-requestsgoogle-cloud-storage

Upload content from GET request to Cloud Storage


I'm currently working with an API that generates CSV files as its output and the only way to retrieve them is to run a request.get, such as:

    raw_report_data = requests.get(report_url).content.decode('utf-8')

Then we upload these files to a GCP Cloud Storage, and we have multiple ways of doing that according to the GCP documentation.

I'd like to avoid downloading the whole report locally only to upload it to our GCP bucket. I'm aware that requests.get allows a stream=True argument, which downloads the content gradually, but I can't make it work with an "stream upload" to the Cloud Storage.

Here it is a code snippet for what I'm trying to do. I'm using a dummy CSV in order to simplify the API part, so we can focus on the problem


import requests
from google.cloud import storage


url = "https://people.sc.fsu.edu/~jburkardt/data/csv/addresses.csv"

# GCP info
client = storage.Client(project="my-project")
bucket = client.get_bucket('my-bucket')
target_blob = bucket.blob("test/report_01.csv")

with requests.get(url, stream=True) as f:
    target_blob.upload_from_file(f)

For this code I get the following error..

AttributeError: 'Response' object has no attribute 'tell'

I think that I'm trying to join two incompatible things, but I'd appreciate any ideas, even if it's to tell me that this can't be done.

Extras:

  • I'm aware of a similar question here on SO (Upload file to cloud storage from request without saving it locally), but the only answer uses a file.read() method, and, as far as I'm concerned, it reads the whole document before the upload. My desire is to upload the content while it's being downloaded, to avoid unnecessary use of local storage.

Solution

  • You are getting that error because the object that you retrieved using the requests library in Python does not have an attribute or method like tell().

    Based on this documentation, you can either use response.text to read the content response from the server. You can also use response.json() if you are dealing with data in JSON format. If you want to get a raw stream of bytes of your data, use response.raw and set stream=True at first when making the request.

    Since you are working with stream upload using upload_from_file, you can try using response.raw in your code. Here’s an example:

    import requests
    from google.cloud import storage
    
    
    url = "https://people.sc.fsu.edu/~jburkardt/data/csv/addresses.csv"
    
    # GCP info
    client = storage.Client(project="my-project")
    bucket = client.get_bucket('my-bucket')
    target_blob = bucket.blob("test/report_01.csv")
    
    with requests.get(url, stream=True) as f:
        target_blob.upload_from_file(f.raw)