Search code examples
google-app-engineauthenticationgoogle-cloud-storageaclblobstorage

Cloud storage and secure download strategy on app engine. GCS acl or blobstore


My appengine app creates cloudstorage files. The files will be downloaded by a third party. The files contain personal medical information.

What would be the preferred way of downloading:

  1. Using a direct GCS download link with a user READER acl.
  2. Or using a blobstore download handler in an appengine app.

Both solutions require the third party to login (google login). Performance is not an issue. Privacy and the occurrence of security errors and mistakes are.

Using an encrypted zip file to download is an option. This means I have to store the password in the project. Or e-mail a random password?

Update The appengine code I used to create a signed download url

import time
import urllib
from datetime import datetime, timedelta
from google.appengine.api import app_identity
import os
import base64

API_ACCESS_ENDPOINT = 'https://storage.googleapis.com'

# Use the default bucket in the cloud and not the local SDK one from app_identity
default_bucket = '%s.appspot.com' % os.environ['APPLICATION_ID'].split('~', 1)[1]
google_access_id = app_identity.get_service_account_name()


def sign_url(bucket_object, expires_after_seconds=60):
    """ cloudstorage signed url to download cloudstorage object without login
        Docs : https://cloud.google.com/storage/docs/access-control?hl=bg#Signed-URLs
        API : https://cloud.google.com/storage/docs/reference-methods?hl=bg#getobject
    """

    method = 'GET'
    gcs_filename = '/%s/%s' % (default_bucket, bucket_object)
    content_md5, content_type = None, None

    expiration = datetime.utcnow() + timedelta(seconds=expires_after_seconds)
    expiration = int(time.mktime(expiration.timetuple()))

    # Generate the string to sign.
    signature_string = '\n'.join([
        method,
        content_md5 or '',
        content_type or '',
        str(expiration),
        gcs_filename])

    _, signature_bytes = app_identity.sign_blob(signature_string)
    signature = base64.b64encode(signature_bytes)

    # Set the right query parameters.
    query_params = {'GoogleAccessId': google_access_id,
                    'Expires': str(expiration),
                    'Signature': signature}

    # Return the download URL.
    return '{endpoint}{resource}?{querystring}'.format(endpoint=API_ACCESS_ENDPOINT,
                                                       resource=gcs_filename,
                                                       querystring=urllib.urlencode(query_params))

Solution

  • If a small number of users have access to all the files in the bucket, then solution #1 would be sufficient, as managing the ACL would not be too much of a pain.

    However, if you have many different users who each require different access to the different files in the bucket, then solution #1 is impractical.

    I'd avoid solution #2 as well, as you'd be paying for unnecessary incoming/outgoing GAE bandwidth.

    Maybe a third solution to consider, would be to use App Engine handle authentication, and write logic to determine which users have access to which files. Then, when a file is requested for download, you create Signed URLs to download the data direct from GCS. You can set the expiration parameter to a value that works for you, which would invalidate the URL after a set amount of time.