Search code examples
pythondataframepysparkdatabricks

Dataframe: how to lookup values in another table, if not found, return blank/Databricks, pyspark


I'm working in Databricks to create data frame based on two existing tables. I need to look up values in 2nd table, return the value, if not found, then return blank. for example: df1: | Material | KG | | -------- | -------- | | 1 | 10 | | 2 | 50 | | 3 | 100 | | 4 | 80 | | 5 | 60 |

df2:

Material Conversion
1 1.5
3 5
5 12

I'd like to have the below df3

Material KG Conversion
1 10 1.5
2 50
3 100 5
4 80
5 60 12

I used the below to add, but it join only those found in df2, so it only shows material 135 and removed Material 24, I still want to keep 1-5 df3 = df1.join(df2,["Material"]).withColumnRenamed("Material", "Material_new").drop("Material_new") display(df3)

I'd like to have the below df3

Material KG Conversion
1 10 1.5
2 50
3 100 5
4 80
5 60 12

Solution

  • Try with left join for this case.

    Example:

    df = spark.createDataFrame([(1,10),(2,50),(3,100),(4,80),(5,60)],['Material','KG'])
    df1 = spark.createDataFrame([(1,1.5),(3,5.0),(5,12.0)],['Material','Conversion'])
    df.join(df1,['Material'],'left').show(10,False)
    #+--------+---+----------+
    #|Material|KG |Conversion|
    #+--------+---+----------+
    #|1       |10 |1.5       |
    #|2       |50 |null      |
    #|3       |100|5.0       |
    #|4       |80 |null      |
    #|5       |60 |12.0      |
    #+--------+---+----------+
    

    Replace null with ' '

    from pyspark.sql.functions import *
    df = spark.createDataFrame([(1,10),(2,50),(3,100),(4,80),(5,60)],['Material','KG'])
    df1 = spark.createDataFrame([(1,1.5),(3,5.0),(5,12.0)],['Material','Conversion'])
    df.join(df1,['Material'],'left').\
      withColumn("conversion",when(col("conversion").isNull(),lit(" ")).\
        otherwise(col("conversion"))).\
          show(10,False)
    #+--------+---+----------+
    #|Material|KG |conversion|
    #+--------+---+----------+
    #|1       |10 |1.5       |
    #|2       |50 |          |
    #|3       |100|5.0       |
    #|4       |80 |          |
    #|5       |60 |12.0      |
    #+--------+---+----------+