val rdd= sc.parallelize(List(41,42,43,44,45,46,47,48,49,50))
val df = rdd.toDF("numbers")
val now = what.select(when($"numbers" % 2===0,$"numbers").otherwise("").as("Even"),
when($"numbers"%2===1,$"numbers").otherwise("").as("Odd"))
.orderBy("Even","Odd").show
+----+---+
|Even|Odd|
+----+---+
| | 41|
| | 43|
| | 45|
| | 47|
| | 49|
| 42| |
| 44| |
| 46| |
| 48| |
| 50| |
+----+---+
I want to remove the empty value in both even and odd column, How can I do that?
Expected Output:
+----+---+
|Even|Odd|
+----+---+
| 42| 41|
| 44| 43|
| 46| 45|
| 48| 47|
| 50| 49|
+----+---+
Not sure what your use case is here, but you can create separate dataframes of the even and odd values, zip them together using the RDD API, and then convert the result back to a dataframe. It's clunky, but it's not a problem that's really in Spark's wheelhouse.
import org.apache.spark.sql.Row
val df = List(41,42,43,44,45,46,47,48,49,50).toDF("numbers")
val evenRDD = df.where('numbers % 2 === 0).rdd
val oddRDD = df.where('numbers % 2 === 1).rdd
val df2 = evenRDD.zip(oddRDD).map{
case (x : Row, y : Row) => (x.getInt(0), y.getInt(0))
}.toDF("even", "odd")
df2.show
+----+---+
|even|odd|
+----+---+
| 42| 41|
| 44| 43|
| 46| 45|
| 48| 47|
| 50| 49|
+----+---+
zip
will only work if you have equal numbers of odd and even values in your initial dataframe. If not, you'll have to make them equal by either trimming off the excess in the larger or padding the smaller with zeroes or some other indicator of nullity.