Search code examples
pythongoogle-cloud-functionssftppysftp

How to transfer data from Google Cloud Storage to SFTP using Python without writing to the file system


I wrote a program using pysftp to download the file from Google Cloud Storage blob, and then upload the file from the file system. I wondered if I could bypass the file system and upload the stream to SFTP.

I am using Google Cloud Functions to run my program and the file system is read-only. So I can't write to disk. Also, it would be much faster to transfer data as it avoids the step of writing and reading from the disk.

for blob in storage_client.list_blobs(bucket, prefix=prefix):
        source = blob.name
        destination = local_download_dir + "/" + remove_prefix(blob.name, prefix)
        blob.download_to_filename(destination)

...

with pysftp.Connection(Config.SFTP_HOST, port=Config.SFTP_PORT, username=Config.SFTP_USER, password=Config.SFTP_PWD, cnopts=cnopts) as sftp:

...
files = listdir(local_download_dir)       
for f in files:
  sftp.put(local_download_dir + "/" + f)  # upload file to remote


Solution

  • Supporting the community by answering my own question. Hope some of you find it useful.

    I initially tried the following, and it worked but may lead to memory issues for big files:

    from io import BytesIO
    sftp.putfo(BytesIO(blob.download_as_bytes()), destination) 
    

    Then found a better approach with blob.open:

    with blob.open("rb") as f:
        sftp.putfo(f, destination) 
    

    In stream mode chunk_size: 40MB default