Search code examples
apache-sparkpysparkgoogle-bigquery

Saving partitioned table with BigQuery Spark connector


I wanted to create a table using from pyspark with the below two options (partition by and require filter) but I can't see an option to do this with the BigQuery connector

This is how I would do it in BigQuery

CREATE dataset.table AS SELECT XXXX 
PARTITION BY
  DATE_TRUNC(collection_date, DAY) OPTIONS ( require_partition_filter = TRUE)

This is what I normally do

dataframe
    .write
    .format("bigquery")
    .mode(mode)
    .save(f"{dataset}.{table_name}")

Solution

  • You can use partitionField, datePartition, partitionType

    For Clustering use - clusteredFields

    See more options:

    https://github.com/GoogleCloudDataproc/spark-bigquery-connector#properties