google-cloud-platform google-bigquery google-cloud-dataflow dataflow google-data

Oracle to Big Query Tables

I am new to big query and learning it. I have a requirement where I'll have to load close to 300 tables from oracle source to Big Query staging tables. what is the recommended way to load the data? I know I can use dataflow for that but do I have to create 300 dataflow tasks for it or create single job to iterate it? Please share your experience and different ways to do it. Thanks a lot.

Regards, Venkat.

Solution

In my experience , we wanted to migrate our data warehouse into bigquery , I didn't use dataflow or any tool I just exported the tables into csv files and then used python code to iterate over the file and upload them into bigquery https://cloud.google.com/bigquery/docs/loading-data-local#python

or you can upload them into gcs and then into bigquery if it's a daily operation I think it's easier to maintain a single code that iterate over a list of tables extract them and append them into bigquery tables than creating 300 tasks

update:

example code for reading data from oracle to bigquery using pandas-gbq:

import cx_Oracle
from sqlalchemy import create_engine

engine = create_engine('oracle://user:password@host_or_scan_address:1521/ORACLE_SERVIVE_NAME')

results = pd.read_sql('select * from table_name', engine,chunk_size= 5000)
if_exists = 'append' # or replace
schema = [] #specify bq_schema here if you don't want the autodetect schema
for result in results:
     result.to_gbq(destination_table='dataset_id.table_id', project_id='project-id',
                        table_schema=schema, if_exists=if_exists)

you can delete the chunk_size argument if you want to load the data as one chunk but this might consume the memory if the table is big



results = pd.read_sql('select * from table_name')
if_exists = 'append' # or replace
schema = [] #specify bq_schema here if you don't want the autodetect schema
results.to_gbq(destination_table='dataset_id.table_id', project_id='project-id',
                        table_schema=schema, if_exists=if_exists)