Search code examples
google-bigquerygoogle-cloud-dataproc

Google Dataproc and BigQuery integration with custom query


I am running spark cluster using Google dataproc. I would like to get data from big-query using custom query. I am able to run the basic word count example but i am looking for a way to run custom query e.g.

SELECT ROW_NUMBER() OVER() as Id, prop11 FROM (
    SELECT prop11 FROM (
        TABLE_DATE_RANGE([mapping.abc_v2_], DATE_ADD(CURRENT_TIMESTAMP(), -1, 'MONTH'), CURRENT_TIMESTAMP())
    ) WHERE (prop11 IS NOT null AND prop11 !="") GROUP EACH BY prop11
)

Do we have Java API in hadoop bigquery connector for this?


Solution

  • Currently the BigQuery Connector for Hadoop does not have a supported mechanism for executing BigQuery queries.

    If your query can be expressed as Spark SQL or via Spark transforms, then you could make use of an export from BigQuery to GCS (the current BigQuery Hadoop Connector workflow) and then use Spark to produce a final result.