Search code examples
google-cloud-platformgoogle-bigquerygoogle-cloud-storageanalyticsdata-warehouse

What are the pros and cons of loading data directly into Google BigQuery vs going through Cloud Storage first?


Also, is there anything wrong with doing transforms/joins directly within BigQuery? I'd like to minimize the number of components and steps involved for a data warehouse I'm setting up (simple transaction and inventory data for a chain of retail stores.)


Solution

  • Loading data via Cloud Storage is the fastest (and the cheapest) way. Loading directly can be done via app (using streaming insert which add some additional cost)

    For the doing transformation - if what are you plan/need to do can be done in BigQuery - you should do it in BigQuery :) - it is the best and fastest way of doing ETL. But you should take in account cost of running query (if you not paying Google for slots - it could be 5$ per 1TB scans)

    Another good options for complex ETL is using Data Flow - but it can became expensive very quick - in exchange of more flexibility.