Search code examples
scalaapache-sparkmapskey-value-observing

Scala spark dataframe map sorting as per key


import spark.implicits._

import org.apache.spark.sql.column

def reverseMap(colName:Column) = map_from_arrays(map_values(colName),map_keys(colName))

val testDF = Seq(("cat",Map("black"->3,"brown"->5,"white"->1)),  ("dog",Map("cream"->6,"black"->5,"white"->2)))

  .toDF("animal","ageMap")

testDF.show(false)

val testDF1 = testDF.withColumn("keySort",map_from_entries(array_sort(map_entries(col("ageMap")))))

This code runs fine in spark >3 . I want to run spark<3 .


Solution

  • From your comment I gather that your code was working in v3.2.2 and not in v2.4.5.

    Your issue is that map_entries does not exist in Spark v2.4.5. You can get the same functionality by extracting the keys and values separately using map_keys and map_values, and then using array_zip to combine them.

    The first bit is exactly the same:

    import spark.implicits._
    import org.apache.spark.sql.Column
    
    def reverseMap(colName:Column) = map_from_arrays(map_values(colName),map_keys(colName))
    val testDF = Seq(("cat",Map("black"->3,"brown"->5,"white"->1)), ("dog",Map("cream"->6,"black"->5,"white"->2))).toDF("animal","ageMap")
    
    testDF.show(false)
    +------+------------------------------------+
    |animal|ageMap                              |
    +------+------------------------------------+
    |cat   |[black -> 3, brown -> 5, white -> 1]|
    |dog   |[cream -> 6, black -> 5, white -> 2]|
    +------+------------------------------------+
    

    And the difference is in how you define testDF1

    val testDF1 = testDF
      .withColumn("keys", map_keys(col("ageMap")))
      .withColumn("values", map_values(col("ageMap")))
      .withColumn("keySort", map_from_entries(array_sort(arrays_zip(col("keys"), col("values")))))
      .select("animal", "ageMap", "keySort")
    
    testDF1.show(false)
    +------+------------------------------------+------------------------------------+
    |animal|ageMap                              |keySort                             |
    +------+------------------------------------+------------------------------------+
    |cat   |[black -> 3, brown -> 5, white -> 1]|[black -> 3, brown -> 5, white -> 1]|
    |dog   |[cream -> 6, black -> 5, white -> 2]|[black -> 5, cream -> 6, white -> 2]|
    +------+------------------------------------+------------------------------------+
    

    This code ran successfully on a v2.4.5 spark-shell.