Search code examples
pythongoogle-cloud-platformgoogle-cloud-storagedigital-oceandigital-ocean-spaces

How to upload from Digital Ocean Storage to Google Cloud Storage, directly, programatically without rclone


I want to migrate files from Digital Ocean Storage into Google Cloud Storage programatically without rclone.

I know the exact location file that resides in the Digital Ocean Storage(DOS), and I have the signed url for the Google Cloud Storage(GCS).

How can I modify the following code so I can copy the DOS file directly into GCS without intermediate download to my computer ?

def upload_to_gcs_bucket(blob_name, path_to_file, bucket_name):
    """ Upload data to a bucket"""

    # Explicitly use service account credentials by specifying the private key
    # file.
    storage_client = storage.Client.from_service_account_json(
        'creds.json')

    #print(buckets = list(storage_client.list_buckets())

    bucket = storage_client.get_bucket(bucket_name)
    blob = bucket.blob(blob_name)
    blob.upload_from_filename(path_to_file)

    #returns a public url
    return blob.public_url

Solution

  • Google's Storage Transfer Servivce should be an answer for this type of problem (particularly because DigitalOcean Spaces like most is S3-compatible. But (!) I think (I'm unfamiliar with it and unsure) it can't be used for this configuration.

    There is no way to transfer files from a source to a destination without some form of intermediate transfer but what you can do is use memory rather than using file storage as the intermediary. Memory is generally more constrained than file storage and if you wish to run multiple transfers concurrently, each will consume some amount of storage.

    It's curious that you're using Signed URLs. Generally Signed URLs are provided by a 3rd-party to limit access to 3rd-party buckets. If you own the destination bucket, then it will be easier to use Google Cloud Storage buckets directly from one of Google's client libraries, such as Python Client Library.

    The Python examples include uploading from file and from memory. It will likely be best to stream the files into Cloud Storage if you'd prefer to not create intermediate files. Here's a Python example