Search code examples
google-bigquerygcloudgoogle-cloud-storagepybigquery

Exporting big query table to GCS to transfer data to a new table


I need to move 20 million records (approx. 10GB) from an unpartitioned big query table to a new partitioned table.

The approach is to export the original table to a GCS bucket in JSON format, using wildcard uris. I get 304 json files (approx 21GB) of different sizes each, just as the documentation says. Then I transfer that data to a new partitioned table using a big query data transfer job that ends successfully. I have also tried doing it with a call to load_table_from_uri in python.

The problem is that the destination table only gets 3.3 million records instead of 20 million. I have been looking into possible limits to no avail, considering:

  1. export limits
  2. bucket size limits
  3. data transfer limits
  4. load_table_from_uri limits

Anyone has a similar experience? Is there a limitation I am not seeing? something to be aware of in the procedure?

Thanks,

Sergio Mujica


Solution

  • Is there a reason you are not doing this with DDL directly?

    CREATE TABLE `datasetId.newTableId`
    PARTITION BY dateField
    CLUSTER BY field2, field3
    AS
    SELECT * FROM datasetId.oldTableId