Search code examples
google-cloud-platformgoogle-bigqueryetldata-ingestion

Best way to ingest data to bigquery


I have heterogeneous sources like flat files residing on prem, json on share point, api which serves data so and so. Which is the best etl tool to bring data to bigquery environment ?

Im a kinder garden student in GCP :)

Thanks in advance


Solution

  • There are many solutions to achieve this. It depends on several factors some of which are:

    1. frequency of data ingestion
    2. whether or not the data needs to be manipulated before being written into bigquery (your files may not be formatted correctly)
    3. is this going to be done manually or is this going to be automated
    4. size of the data being written

    If you are just looking for an ETL tool you can find many. If you plan to scale this to many pipelines you might want to look at a more advanced tool like Airflow but if you just have a few one-off processes you could set up a Cloud Function within GCP to accomplish this. You can schedule it (via cron), invoke it through HTTP endpoint, or pub/sub. You can see an example of how this is done here