Search code examples
pysparkazure-sql-databaseazure-databricks

Delete rows from Azure Sql table using Azure Databricks with Pyspark


Below is the snippet which I am using

azuresqlOptions={ "driver":jdbcDriver, "url":jdbcUrl, "user":username, "port":jdbcPort, "password":password }

query = "(DELETE cone.address WHERE Address_ID=756 ) ad1" df1 = spark.read.format("jdbc").option("header","true").options(**azuresqlOptions).option("dbtable",query).load() display(df1)

I am getting below error:

 com.microsoft.sqlserver.jdbc.SQLServerException: A nested INSERT, UPDATE, DELETE, or MERGE statement must have an OUTPUT clause.

Can any one help me on this


Solution

  • When deleting rows from SQL table in Azure data bricks with sample data

    enter image description here

    with below code:

    from pyspark.sql import SparkSession          
    spark = SparkSession.builder.getOrCreate()    
    Host = "<serverName>.database.windows.net"    
    Port = 1433    
    Database = "<dbName>"    
    Username = "<userName>"    
    Password = "<password>"    
    Driver = "com.microsoft.sqlserver.jdbc.SQLServerDriver"    
    table = "<table>"    
    Url = f"jdbc:sqlserver://{Host}:{Port};databaseName={Database}"
    
    connectionProperties = {    
       "user": Username,    
        "password": Password,   
        "driver": Driver
    }
    
    query = f"(Delete {table} where Id = 1) AS subquery"
    df = spark.read.jdbc(url=Url, table=query, properties=connectionProperties)
    df.show() 
    

    I got the same error:

    enter image description here

    I tried below procedure to delete row from table. I have used filter function to delete row:

    from pyspark.sql.functions import *
    df2 = df.filter(col("<condition>") 
    df2.show()
    

    enter image description here

    I write the data frame into Sql table using below code:

    df2.write.format("jdbc").mode("overwrite").option("url", "<Url>").option("dbtable", ""<table>").option("user", "<Username>").option("password", "<Password>").save()
    

    The table updated successfully.

    enter image description here

    Updated:

    You can use below code to execute delete query:

    import pyodbc
    conn = pyodbc.connect( 'DRIVER={ODBC Driver 17 for SQL Server};'
                           'SERVER=<serverNmae>.database.windows.net;'
                           'DATABASE=<db>;UID=<userName>;'
                           'PWD=<password>')
    conn.execute('DELETE <tableName> WHERE <condition>')