Search code examples
rgooglesheets4gargle

Googlesheets quota limit issues - possible failure to use API key


We are currently using google sheets for a research project on crowd forecasts for Covid-19 case and death numbers.

Google Sheets is used for convenience, but we are often running into quota limit issues - even though the number of users we have should be well below what Google allows.

I attempted to create a somewhat reproducible example by setting up a new google account and creating a sheet from which to read.

The first thing I tried (without making any changes to the google account) is this:

library(googledrive)
library(googlesheets4)

# Google sheets authentification -----------------------------------------------
options(gargle_oauth_cache = ".secrets")
drive_auth(cache = ".secrets", email = "iamatestotest@gmail.com")
gs4_auth(token = drive_token())

sheet_id <- "1Z2O5Mce_haceWfduLenJQP-hddXF9biY_4Ydob_psyQ"

n_tries <- 50

for (i in 1:n_tries) {
  data <- read_sheet(ss = sheet_id)
  Sys.sleep(0.5)
  print(i)
}

From what I understand I should be able to make around 300 read requests per minute, but I'm usually not be able to get the loop to run beyond 30-34.

As I wasn't sure the 300 requests are readily available I went to https://console.cloud.google.com, created a new test project (not sure why that is needed) and explicitly activated the googlesheets API and created some credentials. I created an API key as well as an OAuth 2.0 Client ID (although I am admittedly somewhat lost what this does and how to use it).

API quotas

I next tried to login with my api key by running

drive_deauth()
drive_auth_configure(api_key = "thisismyapikey")
gs4_auth(token = drive_api_key())

but that also didn't get me beyond 33ish. I also had a look into the google console, but also couldn't see any traffic - so not sure my API key got actually used?

API usage graphs

I assume this is due to my inability to actually use the API in the intended way. Any help in setting this up / increasing the quota would be much appreciated. If that helps I'm happy to give access to the test account - simply write me a message.


Solution

  • With some kind help from very friendly people I think I mostly figured this out and it was indeed my failure to use the API correctly.

    Why my approach failed

    when you use googlesheets4 and any of its function out of the box, you get asked to authorize the tidyverse API OAuth app (you login with your Google credentials and give the OAuth app access rights). This means that you make all requests through the tidyverse OAuth app, as are all other users in the world who use this functionality. This is very nice as it works out of the box, but runs into limitations if other people are using the package at the same time. Sharing this quota with other people meant that I ran into limitations quite unpredictably.

    How to change the setup to make it work

    There are a couple of things that help to alleviate / solve the problem.

    1. use the devtools version of googlesheets4 (devtools::install_github("tidyverse/googlesheets4"). This dev version of googlesheets4 in turn relies on the dev version of gargle, the package that manages the google authentification. The dev version of gargle has a retry function, that automatically retries your requests if they fail. This should solve the majority of issues.

    2. Get your own OAuth app / google service account. this allows you to manage the authentification process all on your own. You therefore don't have to share your quota with other users around the world. To set up your own OAuth app / google service account, you can do the following (I'm focusing on the google service account here, as that is much easier in practice).

    • Log into https://console.cloud.google.com/. You will be asked to create a project. You can see your projects on the left next to "Google Cloud Platform". enter image description here
    • Type "APIs and Services" into the search bar, press "enable APIs and services" and search for sheets. Enable this API.
    • Go back to the search bar and type in "Credentials" enter image description here
    • Press "Create credentials" and select service account. A service account gives you programmatic access to the APIs. Give it a name and a description. You should be able to skip the optional parts. Create the service account and go back to the credentials overview. You may have to refresh the page or wait a minute.
    • Click on your service account (it looks like a very cryptic email address) and go to the "KEYS" tab.
    • Click "ADD KEY" and create a new key. As key type, select JSON.
    • Download that key and store it somewhere secure. This should be treated as a combination of password and username!
    • Now to actually use your key with googlesheets4, you can run `gs4_auth(path = "path-to-your-service-account.JSON") enter image description here
    • In order to be able to access your google sheets, you need to grant your service account permissions. Go to your google sheet, press share (as you would do to share it with any other user) and type in this cryptic service account email (it should look something like "1234@something.iam.gserviceaccount.com". Everything should work now without you having to log in anywhere. If you have previously tried other things, I would suggest to restart your R session.
    • profit.

    You should now also be able to track the API requests in the google console dashboard.

    Note that there is still a limit of 60 requests per user per minute, so you're not getting your full 300 requests, but maybe it is possible to create several service accounts and balance the load between these. But not having other people's request interfere with yours is a big improvement!