google-bigquery gcloud google-cloud-storage pybigquery

Exporting big query table to GCS to transfer data to a new table

I need to move 20 million records (approx. 10GB) from an unpartitioned big query table to a new partitioned table.

The approach is to export the original table to a GCS bucket in JSON format, using wildcard uris. I get 304 json files (approx 21GB) of different sizes each, just as the documentation says. Then I transfer that data to a new partitioned table using a big query data transfer job that ends successfully. I have also tried doing it with a call to load_table_from_uri in python.

The problem is that the destination table only gets 3.3 million records instead of 20 million. I have been looking into possible limits to no avail, considering:

export limits
bucket size limits
data transfer limits
load_table_from_uri limits

Anyone has a similar experience? Is there a limitation I am not seeing? something to be aware of in the procedure?

Thanks,

Sergio Mujica

Solution

Is there a reason you are not doing this with DDL directly?

CREATE TABLE `datasetId.newTableId`
PARTITION BY dateField
CLUSTER BY field2, field3
AS
SELECT * FROM datasetId.oldTableId