Search code examples
airflowdatabricksaws-databricks

specify a database name in databricks sql connection parameters


I am using airflow 2.0.2 to connect with databricks using the airflow-databricks-operator. The SQL Operator doesn't let me specify the database where the query should be executed, so I have to prefix the table_name with database_name. I tried reading through the doc of databricks-sql-connector as well here -- https://docs.databricks.com/dev-tools/python-sql-connector.html and still couldn't figure out if I could give the database name as a parameter in the connection string itself.

I tried setting database/schema/namespace in the **kwargs, but no luck. The query executor keeps saying that the table not found, because the query keeps getting executed in the default database.


Solution

  • Right now it's not supported - primarily reason is that if you have multiple statements then connector could reconnect between their execution, and result of use will be lost. databricks-sql-connector also doesn't allow setting of the default database.

    Right now you can workaround that by adding explicit use <database> statement into a list of SQLs to execute (the sql parameter could be a list of strings, not only string).

    P.S. I'll look, maybe I'll add setting of the default catalog/database in the next versions