Search code examples
pythongoogle-bigquerygoogle-cloud-storage

Google BigQuery query in Python works when using result(), but Permission issue when using to_dataframe()


I've run into a problem after upgrades of my pip packages and my bigquery connector that returns query results suddenly stopped working with following error message

from google.cloud import bigquery
from google.oauth2 import service_account

credentials = service_account.Credentials.from_service_account_file('path/to/file', scopes=['https://www.googleapis.com/auth/cloud-platform',
    'https://www.googleapis.com/auth/drive',
    'https://www.googleapis.com/auth/bigquery'
])

client = bigquery.Client(credentials=credentials)
data = client.query('select * from dataset.table').to_dataframe()

PermissionDenied: 403 request failed: the user does not have bigquery.readsessions.create' permission

But! If you switched the code to

data = client.query('select * from dataset.table').result()

(dataframe -> result) you received the data in RowIterator format and were able to properly read them.

The same script using to_dataframe with the same credentials was working on the server. Therefore I set my bigquery package to the same version 2.28.0, which still did not help.

I could not find any advices on this error / topic anywhere, so I just want to share if any of you faced the same thing.


Solution

  • There are different ways of receiving data from bigquery. Using the BQ Storage API is considered more efficient for larger result sets compared to the other options:

    The BigQuery Storage Read API provides a third option that represents an improvement over prior options. When you use the Storage Read API, structured data is sent over the wire in a binary serialization format. This allows for additional parallelism among multiple consumers for a set of results

    The Python BQ library internally determines whether it can use the BQ Storage API or not. For the result method, it uses the tradtional tabledata.list method internally, whereas the to_dataframe method uses the BQ Storage API if the according package is installed.

    However, using the BQ Storage API requires you to have the bigquery.readSessionUser Role respectively the readsessions.create right which in your case seems to be lacking.

    By uninstalling the google-cloud-bigquery-storage, the google-cloud-bigquery package was falling back to the list method. Hence, by de-installing this package, you were working around the lack of rights.

    See the BQ Python Libary Documentation for details.