Chunking BigQuery results

Hi I'm using the BigQuery python library and I'm having some trouble understanding how the RowIterator class works. Given a query job_id I would like to fetch the result rows in chunks of fixed size.

I first run the query, using the client.

query_job = client.query(query, job_config=job_config)

Later I fetch the job like this -

job = client.get_job(query_job.job_id)
result_iter = job.result()

I would like to know how job.result() gets the results. Does it fetch results one row at a time from the bigquery result table or are the entire results stored in memory. Also, how or where do I specify that I would like to fetch results in fixed chunksize and get the next page token?

Thanks!

Solution

Looking at the RowTerator code, it will attempt to read all results using one GET request and then iterate through them in memory.

There is no way to set max_results in job.result() so it will be limited by BigQuery default limits of 100.000 rows [1] and 10MB total size. If results don't fit in this limit, the iterator will try to read next page.

list_rows API [2] allows you to set max_results and page_token. If your goal is to read all job results in pages of defined size, you can probably use this api to create a separate RowTerator for each page. You can get the next page token from the iterator [3]. Hope this helps.

[1] https://cloud.google.com/bigquery/docs/paging-results

[2] https://github.com/GoogleCloudPlatform/google-cloud-python/blob/16b0f8eb9adacf47127521f93c13380a332268b1/bigquery/google/cloud/bigquery/client.py#L1218

[3] https://github.com/GoogleCloudPlatform/google-cloud-python/blob/b0dc7bc17d8bb3ab56928f28096846c55cb79893/api_core/google/api_core/page_iterator.py#L340