Search code examples
google-bigqueryclustered-index

Make existing bigquery table clustered


I have a quite huge existing partitioned table in bigquery. I want to make the table clustered, at least for the new partition.

From the documentation: https://cloud.google.com/bigquery/docs/creating-clustered-tables, it is said that we are able to Creating a clustered table when you load data and I have tried to load a new partition using clustering fields: job_config.clustering_fields = ["event_type"].

The load finished successfully, however it seems that the new partition is not clustered (I am not really sure how to check whether it is clustered or not, but when I query to that particular partition it would always scan all rows).

Is there a good way to make clustering field for an existing partitioned table?

Any comment, suggestion, or answer is well appreciated.

Thanks a lot, Yosua


Solution

  • This answer is no longer valid / correct

    https://cloud.google.com/bigquery/docs/creating-clustered-tables#modifying-cluster-spec


    You can only specify clustering columns when a table is created
    So, obviously you cannot expect existing non-clustered table and especially just new partitions to become clustered

    The "workaround" is to create new table to be properly partitioned / clustered and load data into it from Google Cloud Storage (GCS). You can export data from original table into GCS first for this so whole process will be free of charge