Search code examples
azure-databricksdatabricks-sqldatabricks-workflows

How to create Databricks SQL object of type Query using Python Script?


Can I create Databricks SQL object of type Query, save SQL statement in it and save it on workspace using python script? Or any alternative to execute .SQL file using workspace instead of git on workflow task type SQL and SQL task is of file, want to use SQL warehouse only.

I did research online but could not find any useful information regarding this.


Solution

  • If you having .sql file the only possible way is, to integrate with git that also only one statement.

    You refer about this here Create a new job | Jobs API | REST API reference under task of type SQL with file type as source.

    If file, indicates that this job runs a SQL file in a remote Git repository. Only one SQL statement is supported in a file. Multiple SQL statements separated by semicolons (;) are not permitted.

    So, to achieve your goal need to create a query using this .sql file with the help of REST API.

    Using below python code you can create Query and use it further in job.

    Create a new query definition | Queries / Results API | REST API reference

    Above is the Api used to create new query.

    import requests
    
    token = "dapi.............."
    
    sql_file_path = "query.sql" #path_to_sql_file
    with open(sql_file_path, "r") as file:
        sql_commands = file.read()
    
    endpoint = f"https://<Host>.azuredatabricks.net/api/2.0/preview/sql/queries"
    
    
    headers = {
        "Authorization": f"Bearer {token}",
        "Content-Type": "application/json",
    }
    
    payload = {
      "data_source_id": "7418cab5-bfba-..........", #sql warehouse data source id
      "query": sql_commands,
      "name": "Creating max timestamp",
      "run_as_role": "viewer"
    }
    
    
    response = requests.post(endpoint, headers=headers, json=payload)
    
    if response.status_code == 200:
        print("SQL query created successfully.")
    else:
        print(f"Error: {response.status_code}, {response.text}")
    
    

    Here, if you see i have given data source id which is different from warehouse id, you can get that using below Api /api/2.0/preview/sql/data_sources for SQL path you can give dbfs path which you uploaded your SQL file to dbfs earlier.

    After successfully creation it will appear in queries tab and you can use it in workflow job.

    enter image description here