Search code examples
modelgoogle-bigquerybytetraining-data

Why does the Bytes Processed for the query is 100GB when my dataset is only 2.4gb in Google BigQuery ML?


I ran a create model for a table of 2.4gb that is not external and it run for 14 hours and 25 min as shown

enter image description here

This is the job information

The first image says that My query will process 2.4gb when run. Second one says it processed and billed 100GB. Any idea why ?


Solution

  • For time series models (assuming it is what you have), when auto-arima is enabled for automatic hyper-parameter tuning, multiple candidate models are fitted and evaluated during the training phase. In this case, the number of bytes processed by the input SELECT statement is multiplied by the number of candidate models, which can be controlled by the AUTO_ARIMA_MAX_ORDER training option.

    Also, CREATE MODEL statement stops at 50 iterations for iterative models.

    Combining two above facts and your numbers (2 and 100) - it looks like is explains / answers your question