input
+--------------+-----------------------+-----------------------+
|ID |Subject |Marks |
+--------------+-----------------------+-----------------------+
|1 |maths,physics |80,90 |
|2 |Computer |73 |
|3 |music,sports,chemistry |76,89,85 |
+--------------+-----------+-----------+-----------------------+
Expected output
+--------------+-----------------------+-----------------------+
|ID |Subject |Marks |
+--------------+-----------------------+-----------------------+
|1 |maths |80 |
|1 |physics |90 |
|2 |Computer |73 |
|3 |music |76 |
|3 |sports |89 |
|3 |chemistry |85 |
+--------------+-----------+-----------+-----------------------+
need help in getting this expected output ,have already tried explode function but that only works on single column
Another way;, split the columns on ,
to form arrays. Zip the arrays and leverage pysparks' inline function to achieve what you want
df.withColumn('Subject', split(col("Subject"),",")).withColumn('Marks', split(col("Marks"),",")).selectExpr('ID','inline(arrays_zip(Subject,Marks))')
+---+---------+-----+
| ID| Subject|Marks|
+---+---------+-----+
| 1| maths| 80|
| 1| physics| 90|
| 2| Computer| 73|
| 3| music| 76|
| 3| sports| 89|
| 3|chemistry| 85|
+---+---------+-----+