Search code examples
pythonauthenticationgoogle-drive-apijupyter-notebookgoogle-colaboratory

In Google CoLab Notebook, how to read data from a Public Google Drive AND my personal drive *without* authenticating twice?


I have a Google CoLab notebook used by third-parties. The user of the notebook needs the notebook to read CSVs both from their personal mounted GDrive as well as from a 3rd-party publicly shared GDrive. As far as I can tell, reading from these 2 different sources each require the user to complete an authentication verification code workflow copy/pasting a code each time. The UX would be much improved if they only had to do a single authentication verification, rather than 2.

Put another way: if I've already authenticated and verified who I am to mount my drive, then why do I need to do it again to read data from a publicly shared Google Drive?

I figured there would be someway to use the authentication from one method first step in the second method (see details below), or to somehow request permissions to both in a single step, but I am not having any luck figuring it out.

Background

There has been a lot written about how to read data into Google Colab notebooks: Import data into Google Colaboratory & Towards Data Science - 3 ways to load CSV files into colab and Google CoLab's official helper notebook are some good references.

To quickly recap, you have a few options, depending on where the data is coming from. If you are working with your own data, then an easy solution is to put your data in Google Drive, and then mount your drive.

from google.colab import drive as mountGoogleDrive
mountGoogleDrive.mount('/content/mountedDrive')

And you can read files as if they were in your local filesystem at content/mountedDrive/.

Sometimes mounting your drive is not sufficient. For example, let's say you want to read data from a publicly shared Google Drive owned by a 3rd party. In this case, you can't mount your drive, because the shared data is not in your Drive. You could copy all of the data out of the 3rd parties drive and into your drive, but it would be preferable to read directly from the Public Drive, especially if this is a shared notebook that many people use.

In this case, you can use PyDrive (see same references).

from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
from google.colab import auth
from oauth2client.client import GoogleCredentials

# Authenticate and create the PyDrive client.
auth.authenticate_user()
gauth = GoogleAuth()
gauth.credentials = GoogleCredentials.get_application_default()
drive = GoogleDrive(gauth)

You have to look up the drive id for your dataset, and then you can read it, e.g., like this:

import pandas as pd
downloaded = drive.CreateFile({'id':id}) 
downloaded.GetContentFile('Filename.csv') 
df = pd.read_csv('Filename.csv') 

In both of these work flows, you must authenticate your Google Account by following a special link, copying a code, and pasting the code back into the notebook.

enter image description here

Here is my problem:

I want to do both of these things in the same notebook: (1) read from a mounted google drive and (2) read from a publicly shared GDrive. The user of my notebook is a third party. If the notebook runs both sets of code, then the user is forced to perform the authentication validation code twice. It's a bad UX, and confusing, and seems like it should be unnecessary.

Things I have tried:

Regarding this code:

auth.authenticate_user() # We already authenticated when we mounted our GDrive
gauth = GoogleAuth()

I thought there might be a way to pass the gauth object into the .mount() function so that if credentials already existed, you would not need to re-request authentication with a new verification code. But I have not been able to find documentation on google.colab.drive.mount(), and guessing randomly at passing parameters is not working out.

Alternatively we could go vice versa, however I am not sure if it is possible to save/extract authentication permissions from .mount().

Next I tried running the following code, removing the explicit authenticate_user() call after the mounting had already happened, like this:

from google.colab import drive as mountGoogleDrive
mountGoogleDrive.mount('/content/mountedDrive')

from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
from google.colab import auth
from oauth2client.client import GoogleCredentials

# Authenticate and create the PyDrive client.
# auth.authenticate_user() # Commented out, hoping we already authenticated during mounting
gauth = GoogleAuth()
gauth.credentials = GoogleCredentials.get_application_default()
drive = GoogleDrive(gauth)

The first 2 lines run as expected, including the authentication link and verification code. However once we get to the line gauth.credentials = GoogleCredentials.get_application_default() my 3rd party user gets the following error:

   1260         # If no credentials, fail.
-> 1261         raise ApplicationDefaultCredentialsError(ADC_HELP_MSG)
   1262 
   1263     @staticmethod

ApplicationDefaultCredentialsError: The Application Default Credentials are not available. They are available if running in Google Compute Engine. Otherwise, the environment variable GOOGLE_APPLICATION_CREDENTIALS must be defined pointing to a file defining the credentials. See https://developers.google.com/accounts/docs/application-default-credentials for more information.

I'm not 100% what these different lines accomplish, so I tried removing the error line as well:

from google.colab import drive as mountGoogleDrive
mountGoogleDrive.mount('/content/mountedDrive')

from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
from google.colab import auth
from oauth2client.client import GoogleCredentials

# Authenticate and create the PyDrive client.
# auth.authenticate_user() # Commented out, hoping we already authenticated during mounting
gauth = GoogleAuth()
# gauth.credentials = GoogleCredentials.get_application_default() # Commented out, hoping we don't need this line if we are already mounted? 
drive = GoogleDrive(gauth)

This now runs without error, however when I then try to read a file from the public drive I get the following error:

InvalidConfigError: Invalid client secrets file ('Error opening file', 'client_secrets.json', 'No such file or directory', 2)

At this point I noticed something that is probably important:

When I run the drive-mounting code, the authentication is requesting access to Google DriveFile Stream.

enter image description here

When I run the PyDrive authentication, the authentication is requesting access on behalf of Google Cloud SDK.

enter image description here

So these are different permissions.

So, the question is... is there anyway to streamline this and package all of these permissions into a single-verification-code authentication work-flow? If I want to read from both my mounted Drive AND from a publicly-shared GDrive, is it required that the notebook user do double-authentication?

Thanks for any pointers to documentation or examples.


Solution

  • There is no way to do this. The OAuth scope is different, one is for Google Drive file system ; the other is for Google Cloud SDK.