Search code examples
databasegoogle-cloud-platformgoogle-bigquerygoogle-cloud-storageetl

Data loading and transformation from gcs to bigquery


I wanted to load some unstructured json files having max size 200gb to bigquery without using any etl tools, I want a simple solution to transform the data from gcs to proper structured json format and some other custom transform logics to implement before loading into bigquery. The challenge is without using any high computing resource and etl tool how to achieve this


Solution

  • The idea is to break 200GB into smaller pieces then use Cloud functions, the way I see it is for you to break it by deploying a Cloud Run (it has a memory cap of 16GB) to split it or manually breaking it. Then, use a Cloud Function to transform the data so you can load it to BigQuery.