I'm learning to build a machine learning pipeline using the TensorFlow extension and I followed the tutorial and now I like to build my own. I'm getting error when I ingest the data directly from BigQuery. Please advise and thanks in advance!
CODE:
from tfx.components.example_gen.big_query_example_gen.component import BigQueryExampleGen
query = """
SELECT * FROM `<project_id>.<database>.<table_name>`
"""
example_gen = BigQueryExampleGen(query=query)
ERROR:
RuntimeError: Missing executing project information. Please use the --project command line option to specify it.
I`m not sure if you solved it already, but to use BigQuery as input you must have the --project-id flag setup like so:
example_gen = components.BigQueryExampleGen(query='SELECT * except(day) FROM `gofind-datalake.data.temp_dist` where rand() < 2800/30713393 limit 3000')
context.run(example_gen, beam_pipeline_args=["--project=gofind-datalake"])