Is left join and right_outer join the same if the tables are positioned differently, in pyspark?

I have 2 dataframes in PySpark,

df1 = spark.createDataFrame([
    ("s1", "artist1"),
    ("s2", "artist2"),
    ("s3", "artist3"),
    ],
    ['song_id', 'artist'])


df1.show()

df2 = spark.createDataFrame([
    ("s1", "2"),
    ("s1", "3"),
    ("s4", "4"),
    ("s4", "5")
    ],
    ['song_id', 'duration'])

df2.show()

Output:

+-------+-------+
|song_id| artist|
+-------+-------+
|     s1|artist1|
|     s2|artist2|
|     s3|artist3|
+-------+-------+



+-------+-----+
|song_id|col_2|
+-------+-----+
|     s1|  hmm|
|     s1| hmmm|
|     s4| acha|
|     s4| ohoo|
+-------+-----+

I apply right_outer and left join on these 2 dataframes and they both seem to give me the same result-

df1.join(df2, on="song_id", how="right_outer").show()
df2.join(df1, on="song_id", how="left").show()

Output:

 +-------+-------+--------+
|song_id| artist|duration|
+-------+-------+--------+
|     s1|artist1|       2|
|     s1|artist1|       3|
|     s4|   null|       4|
|     s4|   null|       5|
+-------+-------+--------+

+-------+--------+-------+
|song_id|duration| artist|
+-------+--------+-------+
|     s1|       2|artist1|
|     s1|       3|artist1|
|     s4|       4|   null|
|     s4|       5|   null|
+-------+--------+-------+

I am not sure how to use these 2 joins effectively. What is the difference between these 2 joins?

Solution

The left and right joins gives result based on the order of table respective to join keyword.

Left/leftouter/left_outer joins are all same and shows the whole left table and the matching records of the right table.

Right/rightouter/right_outer joins are all same and shows the whole right table and the matching records of the left table.

In the code

df1.join(df2, on="song_id", how="right_outer").show()

df1 is the left table(dataframe) and df2 is the right table and join type is right_outer, hence it shows all the rows of df2 and matching rows of the df1.

Similarly in

df2.join(df1, on="song_id", how="left").show()

df2 is the left table and df1 is the right table and the join type is left, so it shows all records of df2 and matching records of df1.

Hence both code shows the same result.

df1.join(df2, on="song_id", how="right_outer").show()
df1.join(df2, on="song_id", how="left").show()

In the above code, I have placed df1 as left table in both queries. And here is the result:-

song_id	artist	duration
s1	artist1	2
s1	artist1	3
s4	null	4
s4	null	5

song_id	artist	duration
s1	artist1	2
s1	artist1	3
s2	artist2	null
s3	artist3	null

https://spark.apache.org/docs/latest/api/python/reference/api/pyspark.sql.DataFrame.join.html#pyspark.sql.DataFrame.join

You can use this for reference.