Search code examples
pythongoogle-cloud-platformgoogle-cloud-storagegoogle-compute-engine

Overwrite single file in a Google Cloud Storage bucket, via Python code


I have a logs.txt file at certain location, in a Compute Engine VM Instance. I want to periodically backup (i.e. overwrite) logs.txt in a Google Cloud Storage bucket. Since logs.txt is the result of some preprocessing made inside a Python script, I want to also use that script to upload / copy that file, into the Google Cloud Storage bucket (therefore, the use of cp cannot be considered an option). Both the Compute Engine VM instance, and the Cloud Storage bucket, stay at the same GCP project, so "they see each other". What I am attempting right now, based on this sample code, looks like:

from google.cloud import storage

bucket_name = "my-bucket"
destination_blob_name = "logs.txt"
source_file_name = "logs.txt"  # accessible from this script

storage_client = storage.Client()
bucket = storage_client.bucket(bucket_name)
blob = bucket.blob(destination_blob_name)

generation_match_precondition = 0
blob.upload_from_filename(source_file_name, if_generation_match=generation_match_precondition)

print(f"File {source_file_name} uploaded to {destination_blob_name}.")

If gs://my-bucket/logs.txt does not exist, the script works correctly, but if I try to overwrite, I get the following error:

Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/google/cloud/storage/blob.py", line 2571, in upload_from_file
    created_json = self._do_upload(
  File "/usr/local/lib/python3.8/dist-packages/google/cloud/storage/blob.py", line 2372, in _do_upload
    response = self._do_multipart_upload(
  File "/usr/local/lib/python3.8/dist-packages/google/cloud/storage/blob.py", line 1907, in _do_multipart_upload
    response = upload.transmit(
  File "/usr/local/lib/python3.8/dist-packages/google/resumable_media/requests/upload.py", line 153, in transmit
    return _request_helpers.wait_and_retry(
  File "/usr/local/lib/python3.8/dist-packages/google/resumable_media/requests/_request_helpers.py", line 147, in wait_and_retry
    response = func()
  File "/usr/local/lib/python3.8/dist-packages/google/resumable_media/requests/upload.py", line 149, in retriable_request
    self._process_response(result)
  File "/usr/local/lib/python3.8/dist-packages/google/resumable_media/_upload.py", line 114, in _process_response
    _helpers.require_status_code(response, (http.client.OK,), self._get_status_code)
  File "/usr/local/lib/python3.8/dist-packages/google/resumable_media/_helpers.py", line 105, in require_status_code
    raise common.InvalidResponse(
google.resumable_media.common.InvalidResponse: ('Request failed with status code', 412, 'Expected one of', <HTTPStatus.OK: 200>)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/my_folder/upload_to_gcs.py", line 76, in <module>
    blob.upload_from_filename(source_file_name, if_generation_match=generation_match_precondition)
  File "/usr/local/lib/python3.8/dist-packages/google/cloud/storage/blob.py", line 2712, in upload_from_filename
    self.upload_from_file(
  File "/usr/local/lib/python3.8/dist-packages/google/cloud/storage/blob.py", line 2588, in upload_from_file
    _raise_from_invalid_response(exc)
  File "/usr/local/lib/python3.8/dist-packages/google/cloud/storage/blob.py", line 4455, in _raise_from_invalid_response
    raise exceptions.from_http_status(response.status_code, message, response=response)
google.api_core.exceptions.PreconditionFailed: 412 POST https://storage.googleapis.com/upload/storage/v1/b/production-onementor-dt-data/o?uploadType=multipart&ifGenerationMatch=0: {
  "error": {
    "code": 412,
    "message": "At least one of the pre-conditions you specified did not hold.",
    "errors": [
      {
        "message": "At least one of the pre-conditions you specified did not hold.",
        "domain": "global",
        "reason": "conditionNotMet",
        "locationType": "header",
        "location": "If-Match"
      }
    ]
  }
}
: ('Request failed with status code', 412, 'Expected one of', <HTTPStatus.OK: 200>)

I have checked the documentation for upload_from_filename, but it seems there is no flag to "enable overwritting".

How to properly overwrite a file existing in a Google Cloud Storage Bucket, using Python language?


Solution

  • It's because of if_generation_match

    As a special case, passing 0 as the value for if_generation_match makes the operation succeed only if there are no live versions of the blob.

    This is what is meant by the return message "At least one of the pre-conditions you specified did not hold."

    You should pass None or leave out that argument altogether.