scala apache-spark apache-spark-sql rdd case-when

How to make two columns from 1 column while dividing data between them in spark?

val rdd= sc.parallelize(List(41,42,43,44,45,46,47,48,49,50))

val df = rdd.toDF("numbers")

val now = what.select(when($"numbers" % 2===0,$"numbers").otherwise("").as("Even"),
                      when($"numbers"%2===1,$"numbers").otherwise("").as("Odd"))
                      .orderBy("Even","Odd").show
+----+---+
|Even|Odd|
+----+---+
|    | 41|
|    | 43|
|    | 45|
|    | 47|
|    | 49|
|  42|   |
|  44|   |
|  46|   |
|  48|   |
|  50|   |
+----+---+

I want to remove the empty value in both even and odd column, How can I do that?
Expected Output:

+----+---+
|Even|Odd|
+----+---+
|  42| 41|
|  44| 43|
|  46| 45|
|  48| 47|
|  50| 49|
+----+---+

Solution

Not sure what your use case is here, but you can create separate dataframes of the even and odd values, zip them together using the RDD API, and then convert the result back to a dataframe. It's clunky, but it's not a problem that's really in Spark's wheelhouse.

import org.apache.spark.sql.Row

val df = List(41,42,43,44,45,46,47,48,49,50).toDF("numbers")

val evenRDD = df.where('numbers % 2 === 0).rdd
val oddRDD = df.where('numbers % 2 === 1).rdd

val df2 = evenRDD.zip(oddRDD).map{
    case (x : Row, y : Row) => (x.getInt(0), y.getInt(0))
    }.toDF("even", "odd")

df2.show
+----+---+
|even|odd|
+----+---+
|  42| 41|
|  44| 43|
|  46| 45|
|  48| 47|
|  50| 49|
+----+---+

zip will only work if you have equal numbers of odd and even values in your initial dataframe. If not, you'll have to make them equal by either trimming off the excess in the larger or padding the smaller with zeroes or some other indicator of nullity.