Search code examples
tensorflowgoogle-cloud-platformgoogle-bigquerytensorflow2.0tfx

Issue on Tensorflow Extension BigQuery as ExampleGen


I'm learning to build a machine learning pipeline using the TensorFlow extension and I followed the tutorial and now I like to build my own. I'm getting error when I ingest the data directly from BigQuery. Please advise and thanks in advance!

CODE:

from tfx.components.example_gen.big_query_example_gen.component import BigQueryExampleGen

query = """
    SELECT * FROM `<project_id>.<database>.<table_name>`
"""
example_gen = BigQueryExampleGen(query=query)

ERROR:

RuntimeError: Missing executing project information. Please use the --project command line option to specify it.

Solution

  • I`m not sure if you solved it already, but to use BigQuery as input you must have the --project-id flag setup like so:

    example_gen = components.BigQueryExampleGen(query='SELECT * except(day) FROM `gofind-datalake.data.temp_dist` where rand() < 2800/30713393 limit 3000')
    context.run(example_gen, beam_pipeline_args=["--project=gofind-datalake"])