Search code examples
pythongoogle-cloud-storagegoogle-cloud-datalab

Upload files to Google Cloud Storage Bucket from Google Cloud Datalab using Python API


I'm trying to upload files from my Datalab instance within the notebook itself to my Google Storage Bucket using the Python API but I'm unable to figure it out. The code example provided by Google in its documentation doesn't seem to work in Datalab. I'm currently using the gsutil command but would like to understand how to do this in using the Python API.

File Directory (I want to upload the python files located in the checkpoints folder):

!ls -R

.:
checkpoints  README.md  tpot_model.ipynb

./checkpoints:
pipeline_2020.02.29_00-22-17.py  pipeline_2020.02.29_06-33-25.py
pipeline_2020.02.29_00-58-04.py  pipeline_2020.02.29_07-13-35.py
pipeline_2020.02.29_02-00-52.py  pipeline_2020.02.29_08-45-23.py
pipeline_2020.02.29_02-31-57.py  pipeline_2020.02.29_09-16-41.py
pipeline_2020.02.29_03-02-51.py  pipeline_2020.02.29_11-13-00.py
pipeline_2020.02.29_05-01-17.py

Current Code:

import google.datalab.storage as storage
from pathlib import Path

bucket = storage.Bucket('machine_learning_data_bucket')


for file in Path('').rglob('*.py'):
    # API CODE GOES HERE

Current Working Solution:

!gsutil cp checkpoints/*.py gs://machine_learning_data_bucket

Solution

  • This is the code that worked for me:

    from google.cloud import storage
    from pathlib import Path
    
    storage_client = storage.Client()
    bucket = storage_client.bucket('bucket')
    
    for file in Path('/home/jupyter/folder').rglob('*.py'):
        blob = bucket.blob(file.name)
        blob.upload_from_filename(str(file))
        print("File {} uploaded to {}.".format(file.name,bucket.name))
    
    

    Output:

    File file1.py uploaded to bucket.
    File file2.py uploaded to bucket.
    File file3.py uploaded to bucket.
    

    EDIT

    Or you can use:

    import google.datalab.storage as storage
    from pathlib import Path
    
    bucket = storage.Bucket('bucket')
    
    for file in Path('/home/jupyter/folder').rglob('*.py'):
        blob = bucket.object(file.name)
        blob.write_stream(file.read_text(), 'text/plain')
        print("File {} uploaded to {}.".format(file.name,bucket.name))
    

    Output:

    File file1.py uploaded to bucket.
    File file2.py uploaded to bucket.
    File file3.py uploaded to bucket.