Search code examples
pythongoogle-cloud-platformgoogle-cloud-storagefastapi

Error uploading file to google cloud storage


How should the files on my server be uploaded to google cloud storage?

the code I have tried is given below, however, it throws a type error, saying, the expected type is not byte for:

the expected type is not byte for:
blob.upload_from_file(file.file.read()).

Although upload_from_file requires a binary type.

@app.post("/file/")
async def create_upload_file(files: List[UploadFile] = File(...)):
    storage_client = storage.Client.from_service_account_json(path.json)
    bucket_name = 'data'
    try:
        bucket = storage_client.create_bucket(bucket_name)
    except Exception:
        bucket = storage_client.get_bucket(bucket_name)
    for file in files:        
        destination_file_name = f'{file.filename}'
        new_data = models.Data(
            path=destination_file_name
        )
        try:
            blob = bucket.blob(destination_file_name)
        blob.upload_from_file(file.file.read())
        except Exception:
            raise HTTPException(
                status_code=500,
                detail="File upload failed"
            )

Solution

  • Option 1

    As per the documentation, upload_from_file() (see the docs on streaming uploads as well) supports a file-like object; hence, you could use the .file attribute of UploadFile (which represents a SpooledTemporaryFile instance). Note that if you had already called read() (i.e., file.read()) on the file object, this would actually read all the way to the end of the buffer stream, leaving zero bytes beyond the cursor. Thus, in that case, you should call seek(), before attempting to upload the file, in order to rewind the cursor to the start of the file (i.e., file.seek(0)). Example:

    # Rewind the stream to the beginning. This step can be omitted if the 
    # input stream is at a correct position.
    file.seek(0)
    
    # Upload data from the stream to your bucket
    blob.upload_from_file(file.file)  
    

    Option 2

    You could read the contents of the file and pass them to upload_from_string() (see the docs on uploading objects from memory as well), which supports data in bytes or string format. For instance:

    blob.upload_from_string(file.file.read())
    

    or, since you defined your endpoint with async def (see this answer for def vs async def):

    contents = await file.read()
    blob.upload_from_string(contents)
    

    Option 3

    For the sake of completeness, upload_from_filename() (see the docs on uploading objects from a file system as well) expects a filename which represents the path to the file. Hence, the No such file or directory error was thrown when you passed file.filename (as mentioned in your comment), as this is not a path to the file. To use that method (as a last resort), you should save the file contents to a NamedTemporaryFile, which "has a visible name in the file system" that "can be used to open the file", and once you are done with it, delete it. Example:

    from tempfile import NamedTemporaryFile
    import os
    
    contents = file.file.read()
    temp = NamedTemporaryFile(delete=False)
    try:
        with temp as f:
            f.write(contents);
        blob.upload_from_filename(temp.name)
    except Exception:
        return {"message": "There was an error uploading the file"}
    finally:
        #temp.close()  # the `with` statement above takes care of closing the file
        os.remove(temp.name)
    

    You could also upload it in chunks concurrently, as in the example provided by Google's python-storage package (the official Python Client for Google Cloud Storage).

    Note 1:

    If you are uploading a rather large file to Google Cloud Storage that may require some time to completely upload, and have encountered a timeout error, please consider increasing the amount of time to wait for the server response, by changing the timeout value, which—as shown in upload_from_file() documentation, as well as all other methods described earlier—by default is set to timeout=60 seconds. To change that, use e.g., blob.upload_from_file(file.file, timeout=180), or you could also set timeout=None (meaning that it will wait until the connection is closed).

    Note 2:

    Since all the above methods from python-storage package perform blocking I/O operations—as can been seen in the source code here, here and here—if you have decided to define your create_upload_file endpoint with async def instead of def (have a look at this answer for more details on def vs async def), you should rather run the "upload file" function in a separate thread to ensure that the main thread (where coroutines are run) does not get blocked. You can do that using Starlette's run_in_threadpool(), which is also used by FastAPI internally (see here as well). For example:

    await run_in_threadpool(blob.upload_from_file, file.file)
    

    Alternatively, you can use asyncio's loop.run_in_executor(), as described in this answer and demonstrated in this sample by python-storage as well.

    As for Option 3, wehere you need to open a NamedTemporaryFile and write the contents to it, you can do that using the aiofiles library, as demonstrated in Option 2 of this answer, that is:

    async with aiofiles.tempfile.NamedTemporaryFile("wb", delete=False) as temp:
        contents = await file.read()
        await temp.write(contents)
        #...
    

    and again, run the "upload file" function in an exterrnal threadpool:

    await run_in_threadpool(blob.upload_from_filename, temp.name)
    

    Finally, have a look at the answers here and here on how to enclose the I/O operations in try-except-finally blocks, so that you can catch any possible exceptions, as well as close the UploadFile object properly. UploadFile is a temporary file that is deleted from the filesystem when it is closed. To find out where your system keeps the temporary files, see this answer. Note: Starlette, as described here, uses a SpooledTemporaryFile with 1MB max_size, meaning that the data is spooled in memory until the file size exceeds 1MB, at which point the contents are written to the temporary directory. Hence, you will only see the file you uploaded showing up in the temp directory, if it is larger than 1MB and if .close() has not yet been called.