I am trying to connect to Databricks cluster via ODBC to run some SQL queries from my local python script.
I am following the Connect Python and pyodbc to Databricks guide and I am able to get the data from the samples
table given in the example (SELECT * FROM samples.nyctaxi.trips LIMIT 10
).
Wheneven I change the query to one reading from my s3 bucket I get this error:
cursor.execute("SELECT * FROM delta.`s3:/my-bucket/some/path/to/dataset/` limit 10")
pyodbc.ProgrammingError: ('42S02', '[42S02] [Simba][SQLEngine] (31740) Table or view not found: .delta.s3://my-bucket/some/path/to/dataset/ (31740) (SQLExecDirectW)')
same query works when executed from a Databricks Notebook on the same cluster.
Do I need to set some special properties? Is the ODBC driver having trouble with the backticks (`)?
I am running the exact saem code from the Databricks guide:
import pyodbc
import os
conn = pyodbc.connect(
"Driver=/Library/simba/spark/lib/libsparkodbc_sb64-universal.dylib;" +
f"Host={os.getenv('DATABRICKS_HOST')};" +
"Port=443;" +
f"HTTPPath={os.getenv('DATABRICKS_HTTP_PATH')};" +
"SSL=1;" +
"ThriftTransport=2;" +
"AuthMech=3;" +
"UID=token;" +
f"PWD={os.getenv('DATABRICKS_TOKEN')}",
autocommit = True
)
# Run a SQL query by using the preceding connection.
cursor = conn.cursor()
cursor.execute("SELECT * FROM samples.nyctaxi.trips")
# Print the rows retrieved from the query.
for row in cursor.fetchall():
print(row)
Databricks ODBC docs page mentions:
The ODBC driver accepts SQL queries in ANSI SQL-92 dialect and translates the queries to the Databricks SQL dialect. However, if your application generates Databricks SQL directly or your application uses any non-ANSI SQL-92 standard SQL syntax specific to Databricks, Databricks recommends that you set UseNativeQuery=1 as a connection configuration. With that setting, the driver passes the SQL queries verbatim to Databricks.
Apparently specific constructs like select from delta.path
can be rewritten (corrupted) using this mechanism. I'm not sure if it's a driver bug or just some ODBC specifics. Anyway, as recommended here, adding UseNativeQuery=1
to connection string will prevent this issue.