jdbc pyspark apache-spark-sql aws-glue-spark

Insert into SQL Server table selected columns from spark dataframe

I have a SQL Server table that has a different schema than my dataframe. I would like to select some columns from my dataframe and "insert into" the table the values I selected.

Basically something similar to the code below but in pyspark:

INSERT INTO Cust_Diff_Schema_tbl
(acct_num, name)
SELECT account_no, name
FROM customers
WHERE customer_id > 5000;

I can read the data using jdbc using spark.read. Just like below:

df_s3 = spark.read.format("jdbc")\
                .option("driver", db_driver_name)\
                .option("url", db_url+ ":1433;databaseName="+stage_db)\
                .option("dbtable", tbl_name)\
                .option("query", """(select * from customers)""")\
                .option("user", db_username)\
                .option("password", db_password)\
                .load()
    
    df_s3.printSchema()
    df_s3.show(20)

To write/append the data to the table with the selected values, I believe I can still use "df_s3.write" but I need an example on how to use the insert statement using ".option" function or another approach if this does not work.

Thanks in advance.

Solution

//create dataframe

val df = //fetch from  db,read file or other options

df.write.format("jdbc")
      .option("numPartitions", 20)
      .option("batchsize", 10000)
      .option("truncate", "true")
      .option("url", "jdbcURL")
      .option("driver", "Driver name")
      .option("dbtable", "tablename")
      .mode("append")
      .save()