Search code examples
google-cloud-platformcloudstoragebucket

How copy file automatically bewteen 2 buckets with two different projects gcp?


Actually i use that command , and it works well :

gsutil cp gs:/bucket1/file.xml gs://bucket2/destination_folder

(bucket1 is in project1 in GCP and bucket2 is in another project in GCP)

But i would like to do that command every day at 9am, how can i do that on my GCP project in a easy way ?

Edit : It will copy the file over and over each day from the source bucket to the destination bucket( the two buckets are in a different project each). (actually when the file arrive in the destination bucket, it is consume and ingest in bigquery automatically , i just want to trigg my command gsutil and stop to do it manually each morning )

(except the method with Data transfert because i have not the right of the source project so i cannot activate the service account for data transfert , i have only the rights on destination project.)

Bests regards,

Actually i can copy a file from a bucket into another bucket into a specfic folder (RQ : the 2 buckets are on the same gcp project) I don't arrive to use the second method with a gs://

EDIT 2:

import base64
import  sys
import urllib.parse
# Imports the Google Cloud client library , dont forget the requirement or else it's ko
from google.cloud import storage


def copy_blob(
    bucket_name ="prod-data", blob_name="test.csv", destination_bucket_name = "prod-data-f", destination_blob_name ="channel_p"
):
    """Copies a blob from one bucket to another with a new name."""
    bucket_name = "prod-data"
    blob_name = "test.csv"
    destination_bucket_name = "prod-data-f"
    destination_blob_name = "channel_p/test.csv"

    storage_client = storage.Client()

    source_bucket = storage_client.bucket(bucket_name)
    source_blob = source_bucket.blob("huhu/"+blob_name)
    destination_bucket = storage_client.bucket(destination_bucket_name)

    blob_copy = source_bucket.copy_blob(
        source_blob, destination_bucket, destination_blob_name
    )

# Second Method (KO)
#
#   client = storage.Client()
#   with open('gs://prod-data-f/channelp.xml','wb') as file_obj:
#       client.download_blob_to_file(
#           'gs://pathsource/somefolder/channelp.xml', file_obj)
#
# End of second Method

    print(
        "Blob {} in bucket {} copied to blob {} in bucket {}.".format(
            source_blob.name,
            source_bucket.name,
            blob_copy.name,
            destination_bucket.name,
        )
    )

Solution

  • The solution that i was seeking (it works for me when i test):

    Main.py

    import base64
    import os
    import sys
    import json
    import uuid
    import logging
    from time import sleep
    from flask import request
    from random import uniform
    from google.cloud import firestore
    from google.cloud.exceptions import Forbidden, NotFound
    from google.cloud import storage
    
    # set retry deadline to 60s
    DEFAULT_RETRY = storage.retry.DEFAULT_RETRY.with_deadline(60)
    
    def Move2FinalBucket(data, context):
    
    #    if 'data' in event:
    #        name = base64.b64decode(event['data']).decode('utf-8')
    #    else:
    #        name = 'NO_DATA'
    #        print('Message {}!'.format(name))
    
    
        # Get cache source bucket
        cache_bucket = storage.Client().get_bucket('nameofmysourcebucket', timeout=540, retry=DEFAULT_RETRY)
    
        # Get source file to copy
        blob2transfer = cache_bucket.blob('uu/oo/pp/filename.csv')
    
        # Get cache destination bucket
        destination_bucket = storage.Client().get_bucket('nameofmydestinationbucket', timeout=540, retry=DEFAULT_RETRY)
    
        # Get destination file
        new_file = destination_bucket.blob('kk/filename.csv')
    
        #rewrite into new_file
        new_file.rewrite(blob2transfer, timeout=540, retry=DEFAULT_RETRY)
    

    requirement.txt

    # Function dependencies, for example:
    # package>=version
    #google-cloud-storage==1.22.0
    google-cloud-storage
    google-cloud-firestore
    google-api-core
    flask==1.1.4
    

    Dont forget to add a service account with the right Storage admin on this CF and it will works.

    Best regards,