Search code examples
google-cloud-platformgoogle-bigquerygoogle-apigoogle-cloud-python

using time partitioning for bigquery load doesn't upload every row


I'm attempting to use the BigQuery python API client for uploading a large dataframe. The upload works however when specifying a time partition only some rows are uploaded. When time partitioning is omitted all rows are uploaded.

JOB_CONFIG:

job_config = bigquery.LoadJobConfig(write_disposition="WRITE_TRUNCATE",autodetect=True,)
job_config.time_partitioning = bigquery.TimePartitioning(type_=bigquery.TimePartitioningType.DAY, field="date")

dataframe is 120,532 rows, when table rows are checked in BigQuery only 62,433 rows are added to a table. I've checked the date column in the DataFrame to make sure every row has a date and I believe that is the case.

Any other ideas on what could be causing an incomplete upload?


Solution

  • in BigQuery sandbox mode there is a 60 day partition expiration limit. I believe that when adding in partitioning the table is uploaded limited to 60 days without providing any message that it does so.