I have a logs.txt
file at certain location, in a Compute Engine VM Instance. I want to periodically backup (i.e. overwrite) logs.txt
in a Google Cloud Storage bucket. Since logs.txt
is the result of some preprocessing made inside a Python script, I want to also use that script to upload / copy that file, into the Google Cloud Storage bucket (therefore, the use of cp
cannot be considered an option). Both the Compute Engine VM instance, and the Cloud Storage bucket, stay at the same GCP project, so "they see each other". What I am attempting right now, based on this sample code, looks like:
from google.cloud import storage
bucket_name = "my-bucket"
destination_blob_name = "logs.txt"
source_file_name = "logs.txt" # accessible from this script
storage_client = storage.Client()
bucket = storage_client.bucket(bucket_name)
blob = bucket.blob(destination_blob_name)
generation_match_precondition = 0
blob.upload_from_filename(source_file_name, if_generation_match=generation_match_precondition)
print(f"File {source_file_name} uploaded to {destination_blob_name}.")
If gs://my-bucket/logs.txt
does not exist, the script works correctly, but if I try to overwrite, I get the following error:
Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/google/cloud/storage/blob.py", line 2571, in upload_from_file
created_json = self._do_upload(
File "/usr/local/lib/python3.8/dist-packages/google/cloud/storage/blob.py", line 2372, in _do_upload
response = self._do_multipart_upload(
File "/usr/local/lib/python3.8/dist-packages/google/cloud/storage/blob.py", line 1907, in _do_multipart_upload
response = upload.transmit(
File "/usr/local/lib/python3.8/dist-packages/google/resumable_media/requests/upload.py", line 153, in transmit
return _request_helpers.wait_and_retry(
File "/usr/local/lib/python3.8/dist-packages/google/resumable_media/requests/_request_helpers.py", line 147, in wait_and_retry
response = func()
File "/usr/local/lib/python3.8/dist-packages/google/resumable_media/requests/upload.py", line 149, in retriable_request
self._process_response(result)
File "/usr/local/lib/python3.8/dist-packages/google/resumable_media/_upload.py", line 114, in _process_response
_helpers.require_status_code(response, (http.client.OK,), self._get_status_code)
File "/usr/local/lib/python3.8/dist-packages/google/resumable_media/_helpers.py", line 105, in require_status_code
raise common.InvalidResponse(
google.resumable_media.common.InvalidResponse: ('Request failed with status code', 412, 'Expected one of', <HTTPStatus.OK: 200>)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/my_folder/upload_to_gcs.py", line 76, in <module>
blob.upload_from_filename(source_file_name, if_generation_match=generation_match_precondition)
File "/usr/local/lib/python3.8/dist-packages/google/cloud/storage/blob.py", line 2712, in upload_from_filename
self.upload_from_file(
File "/usr/local/lib/python3.8/dist-packages/google/cloud/storage/blob.py", line 2588, in upload_from_file
_raise_from_invalid_response(exc)
File "/usr/local/lib/python3.8/dist-packages/google/cloud/storage/blob.py", line 4455, in _raise_from_invalid_response
raise exceptions.from_http_status(response.status_code, message, response=response)
google.api_core.exceptions.PreconditionFailed: 412 POST https://storage.googleapis.com/upload/storage/v1/b/production-onementor-dt-data/o?uploadType=multipart&ifGenerationMatch=0: {
"error": {
"code": 412,
"message": "At least one of the pre-conditions you specified did not hold.",
"errors": [
{
"message": "At least one of the pre-conditions you specified did not hold.",
"domain": "global",
"reason": "conditionNotMet",
"locationType": "header",
"location": "If-Match"
}
]
}
}
: ('Request failed with status code', 412, 'Expected one of', <HTTPStatus.OK: 200>)
I have checked the documentation for upload_from_filename
, but it seems there is no flag to "enable overwritting".
How to properly overwrite a file existing in a Google Cloud Storage Bucket, using Python language?
It's because of if_generation_match
As a special case, passing 0 as the value for if_generation_match makes the operation succeed only if there are no live versions of the blob.
This is what is meant by the return message "At least one of the pre-conditions you specified did not hold."
You should pass None
or leave out that argument altogether.