Search code examples
rgoogle-cloud-platformgoogle-bigquerygoogle-compute-enginerstudio-server

R - Auth into BigQuery from Google Compute Engine without Service Account Keys


We currently use a service account json key to access BigQuery from RStudio non-interactively. But as you may know, there are inherent security issues that come with this method i.e. Anyone with the key on any machine can access BigQuery using this method. So I was wondering, is there a way we can access BigQuery data in RStudio only if it is coming from a particular compute engine instance in GCP? Does google offer a way to auth via compute engine without any keys? If so, how can I use the same via RStudio non-interactively?


Solution

  • Update 20220105 :

    You can simply set the below option at the beginning of the script or via .Rprofile to automatically set this up on startup

    options(gargle.gce.timeout = 2)

    Note - In case of Ubuntu, full path of Rprofile.site is /usr/lib/R/etc/Rprofile.site

    ##########################Old Content#########################

    Update - I was able to perform keyless auth using application direct credentials as DazWilkin had pointed out. But doing so was not straight forward. bigrquery's bq_auth() function uses the gargle package for authentication. gargle has a function called credentials_gce() to auth using service accounts. In theory, when in a compute engine whose service account has access to BQ, bq_auth() would call credentials_gce() and auth should just work seamlessly, without any additional parameters. But in practice, as of v1.2.0 of gargle, I've found a bug. gargle has a default timeout of 0.8 set for getting temporary token from google's metadata server, which is too low such that it times out even before it can receive a legitimate request from google's internal metadata service. As of now, I've found that adding below lines before bq_auth() solves the problem. Infact, you don't need to write bq_auth() separately as bigrquery will automatically run bq_auth() the moment it finds an operation that requires authentication, but I've found that in some cases when combined with googleCloudStorageR package, bq_auth() doesn't work as expected so I've added it separately.

    library(gargle)
    library(bigrquery)
    
    invisible(
      try({
      httr::with_config(httr::timeout(getOption("gargle.gce.timeout", default = 2)), {
        httr::GET('http://metadata.google.internal/computeMetadata/v1/instance/service-accounts/default/token', httr::add_headers("Metadata-Flavor" = "Google"))
        })
      },
      silent = TRUE)
    )
    
    bq_auth()
    

    As an additional note, if someone is looking on how to do the same with googleCloudStorageR, the R package to interact with Google Cloud Storage, the code goes as follows. Note that googleCloudStorageR uses the googleAuthR package for authentication. Calling gcs_auth() separately won't be needed.

    library(googleAuthR)
    
    invisible(
      try({
      httr::with_config(httr::timeout(getOption("gargle.gce.timeout", default = 2)), {
        httr::GET('http://metadata.google.internal/computeMetadata/v1/instance/service-accounts/default/token', httr::add_headers("Metadata-Flavor" = "Google"))
        })
      },
      silent = TRUE)
    )
    
    gar_gce_auth(
      scopes = "https://www.googleapis.com/auth/bigquery",
      service_account = "default"
    )