Search code examples
hivegoogle-cloud-platformairflowgoogle-cloud-dataproc

Unable to query using file in Data Proc Hive Operator


I am unable to query with .sql file in DataProcHiveOperator. Though the documentation tells that we can query using file. Link of the documentation Here

It is working fine when I give query directly Here is my sample code which is working fine with writing query directly:

HiveInsertingTable = DataProcHiveOperator(task_id='HiveInsertingTable',
gcp_conn_id='google_cloud_default', 
query='CREATE TABLE TABLE_NAME(NAME STRING);',
cluster_name='cluster-name',
region='us-central1',
dag=dag)

Querying with file :

HiveInsertingTable = DataProcHiveOperator(task_id='HiveInsertingTable',
gcp_conn_id='google_cloud_default', 
query='gs://us-central1-bucket/data/sample_hql.sql',
query_uri="gs://us-central1-bucket/data/sample_hql.sql
cluster_name='cluster-name',
region='us-central1',
dag=dag)

There is no error on sample_hql.sql script.
It is reading file location as a query and throwing me the error as:

Query: 'gs://bucketpath/filename.q'
Error occuring - cannot recognize input near 'gs' ':' '/'

Similar issue has also been raised Here


Solution

  • The issue is because you have passed query='gs://us-central1-bucket/data/sample_hql.sql' as well.

    You should pass exactly 1 of query or queri_uri.

    The code in your question has both of them, so remove query or use the following code:

    HiveInsertingTable = DataProcHiveOperator(task_id='HiveInsertingTable',
        gcp_conn_id='google_cloud_default', 
        query_uri="gs://us-central1-bucket/data/sample_hql.sql",
        cluster_name='cluster-name',
        region='us-central1',
        dag=dag)