In my pyspark notebook, I
For
#1. I do `spark.read.format("delta").load(path)`
#2. I do `df = df.filter.(...).groupby(...).agg(...)
#3. I do `df.write.format("delta").mode("append").save(output_folder)`
#4. I do `spark.sql(f"CREATE TABLE IF NOT EXISTS {sql_database}.{table_name} USING delta LOCATION '{path to output folder in #3}'") `
The issue I am having is I have debug print statements before and after step 3 and 4. And I check that there are parquet files in my output folder and the Path I use in creating the SQL table is correct. And there is no exception in the console.
But when I try to see look for that newly create table in SQL tool, I can't see it.
Can you please tell me if I need to wait for the 'write' to be done before I create the SQL tables? If yes, what do I need to do to wait for the write to be done?
By default, the .write
is synchronous operation, so step 4 will be executed only after step 3 is done. But really, it's easier to write & create a table in one step by using .saveAsTable
together with the path
option:
df = spark.read.format("delta").load(path)
df = df.filter.(...).groupby(...).agg(...)
df.write.format("delta").mode("append") \
.option("path", output_folder) \
.saveAsTable(f"{sql_database}.{table_name}")