I need to move 20 million records (approx. 10GB) from an unpartitioned big query table to a new partitioned table.
The approach is to export the original table to a GCS bucket in JSON format, using wildcard uris. I get 304 json files (approx 21GB) of different sizes each, just as the documentation says. Then I transfer that data to a new partitioned table using a big query data transfer job that ends successfully. I have also tried doing it with a call to load_table_from_uri in python.
The problem is that the destination table only gets 3.3 million records instead of 20 million. I have been looking into possible limits to no avail, considering:
Anyone has a similar experience? Is there a limitation I am not seeing? something to be aware of in the procedure?
Thanks,
Sergio Mujica
Is there a reason you are not doing this with DDL directly?
CREATE TABLE `datasetId.newTableId`
PARTITION BY dateField
CLUSTER BY field2, field3
AS
SELECT * FROM datasetId.oldTableId