I have a dataframes like below. Tried Join and isin functions but not getting the expected output like below. Not sure what I was missing. Appreciate if someone can help. Thanks.
DF1:
Name | Grade |
---|---|
Tom | A |
Mike | B |
John | C |
DF2:
Approved_Grades | Description |
---|---|
A | Approved grade |
Expected Output:
Name | Grade | Approved |
---|---|---|
Tom | A | Yes |
Mike | B | No |
John | C | No |
Please use 'join' in pyspark to join tables and then using conditions to filter out the result. Check the solution below.
from pyspark.sql import SparkSession
from pyspark.sql.functions import col, when
spark = SparkSession.builder.getOrCreate()
data1 = [('Tom', 'A'),
('Mike', 'B'),
('John', 'C')]
df1 = spark.createDataFrame(data1, ['Name', 'Grade'])
data2 = [('A', 'Approved grade')]
df2 = spark.createDataFrame(data2, ['Approved_Grades', 'Description'])
merged_df = df1.join(df2, df1.Grade == df2.Approved_Grades, 'left') \
.withColumn('Approved', when(col('Approved_Grades').isNull(), 'No').otherwise('Yes')) \
.drop('Approved_Grades', 'Description')
merged_df.show()
Please use the above code and try if that works. thanks