Search code examples
python-3.xobjectgoogle-cloud-platformgoogle-cloud-storagegoogle-cloud-datalab

How do I open images stored in GCP in Google datalab?


I have been trying to open a image that I stored in the GCP bucket in my datalab notebook. When I use Image.open() it says like "No such file or directory: 'images/00001.jpeg'"

My code is:

nama_bucket = storage.Bucket("sample_bucket")
for obj in nama_bucket.objects():
    Image.open(obj.key)

I just need to open the images stored in the bucket and view it. Thanks for the help!


Solution

  • I was able to reproduce the issue and get the same error as you (No such file or directory).

    I will describe the workaround I used to solve it. However,there are few issues that I can see in the code snippet provided:

    • Class IPython.display.Image has no method 'open'.

    • You will need to wrap the Image constructor in a display() method.

    With Storage APIs for Google Cloud Datalab, what resolved the issue for me was using the url parameter instead of the filename.

    Here is the solution that worked for me:

    import google.datalab.storage as storage
    from IPython.display import Image
    
    bucket_name = '<my-bucket-name>'
    sample_bucket = storage.Bucket(bucket_name)
    
    for obj in sample_bucket.objects():
        display(Image(url='https://storage.googleapis.com/{}/{}'.format(bucket_name, obj.key)))
    

    Let me know if it helps!


    EDIT 1:

    As you mentioned that you're using the PIL and would like your images to be handled by it, here's the way to achieve that (I have tested it and it worked well for me):

    import google.datalab.storage as storage
    from PIL import Image
    import requests
    from io import BytesIO
    
    bucket_name = '<my-bucket-name>'
    sample_bucket = storage.Bucket(bucket_name)
    
    for obj in sample_bucket.objects():
        url='https://storage.googleapis.com/{}/{}'.format(bucket_name, obj.key)
        response = requests.get(url)
        img = Image.open(BytesIO(response.content))
        print("Filename: {}\nFormat: {}\nSize: {}\nMode: {}".format(obj.key, img.format, img.size, img.mode))
        display(img) 
    

    Notice that this way you will not need to use IPython.display.Image at all.


    EDIT 2:

    Indeed, the error cannot identify image file <_io.BytesIO object at 0x7f8f33bdbdb0> is appearing because you have a directory in your bucket. In order to solve this issue it's important to understand how Google Cloud Storage sub-directories work.

    Here's how I organized the files in my bucket to replicate your situation:

    my-bucket/
        img/
            test-file-1.png
            test-file-2.png
            test-file-3.jpeg
        test-file-4.png
    

    Even though gsutil achieves the hierarchical file tree illusion by applying a variety of rules, to try to make naming work the way users would expect, in fact, the test-files 1-3 just happen to have '/'s in their names while there's no actual 'img' directory.

    You can still still list all images from your bucket. With the structure I mentioned above it can be achieved, for example, by checking the file's extension:

    import google.datalab.storage as storage
    from PIL import Image
    import requests
    from io import BytesIO
    
    bucket_name = '<my-bucket-name>'
    sample_bucket = storage.Bucket(bucket_name)
    
    for obj in sample_bucket.objects():
        # Check that the object is an image
        if obj.key[-3:].lower() in ('jpg','png') or obj.key[-4:].lower() in ('jpeg'):
            url='https://storage.googleapis.com/{}/{}'.format(bucket_name, obj.key)
            response = requests.get(url)
            img = Image.open(BytesIO(response.content))
            print("Filename: {}\nFormat: {}\nSize: {}\nMode: {}".format(obj.key, img.format, img.size, img.mode))
            display(img)
    

    If you need to get only the images "stored in a particular sub-directory" of your bucket, you will also need to check the files by name:

    import google.datalab.storage as storage
    from PIL import Image
    import requests
    from io import BytesIO
    
    bucket_name = '<my-bucket-name>'
    folder = '<name-of-the-directory>'
    sample_bucket = storage.Bucket(bucket_name)
    
    for obj in sample_bucket.objects():
        # Check that the object is an image AND that it has the required sub-directory in its name
        if (obj.key[-3:].lower() in ('jpg','png') or obj.key[-4:].lower() in ('jpeg')) and folder in obj.key:
            url='https://storage.googleapis.com/{}/{}'.format(bucket_name, obj.key)
            response = requests.get(url)
            img = Image.open(BytesIO(response.content))
            print("Filename: {}\nFormat: {}\nSize: {}\nMode: {}".format(obj.key, img.format, img.size, img.mode))
            display(img)