I wrote a program using pysftp to download the file from Google Cloud Storage blob, and then upload the file from the file system. I wondered if I could bypass the file system and upload the stream to SFTP.
I am using Google Cloud Functions to run my program and the file system is read-only. So I can't write to disk. Also, it would be much faster to transfer data as it avoids the step of writing and reading from the disk.
for blob in storage_client.list_blobs(bucket, prefix=prefix):
source = blob.name
destination = local_download_dir + "/" + remove_prefix(blob.name, prefix)
blob.download_to_filename(destination)
...
with pysftp.Connection(Config.SFTP_HOST, port=Config.SFTP_PORT, username=Config.SFTP_USER, password=Config.SFTP_PWD, cnopts=cnopts) as sftp:
...
files = listdir(local_download_dir)
for f in files:
sftp.put(local_download_dir + "/" + f) # upload file to remote
Supporting the community by answering my own question. Hope some of you find it useful.
I initially tried the following, and it worked but may lead to memory issues for big files:
from io import BytesIO
sftp.putfo(BytesIO(blob.download_as_bytes()), destination)
Then found a better approach with blob.open:
with blob.open("rb") as f:
sftp.putfo(f, destination)
In stream mode chunk_size: 40MB default