I am looking to convert a Python if/else statement written in Spyder to Azure Databricks , I am still rather new to PySpark and not sure how to approach this ... ideas on how to approach this would be appreciated. The Python code sample to be converted is as follows:
length_df = df.shape[0]
if(length_df>0):
qry='''
select *
from df where idnum not in
(select idnum from df_Prev)
'''
else:
df_new= df_.copy()
I tried using the When/Otherwise combination that I read about on my created pyspark dataframes but kept on getting errors because it appears among other things that a column is expected. I understand that I could use the magic '%sql' command to build a statement, however ask is to have it as Python in databricks rather than SQL.
If I understand your question correctly, you need a way in pyspark to filter one dataframe where a key does not exists in another dataframe.
Code to create the dataframes:
df = spark.createDataFrame(
[
(1, "something1"),
(2, "something2"),
(3, "something3"),
],
["id", "label"]
)
prev_df = spark.createDataFrame(
[
(2, "prev2")
],
["id", "label"]
)
now join the dataframes with an anti join (return rows that does not answer the criteria):
if df.count()>0:
return_df = df.join(prev_df,"id","anti")
else:
return_df = df
Show the results:
display(return_df)