Search code examples
arraysapache-sparkreplacecapitalization

How to replace item in array with other values in spark?


+--------------------------------+
|Subject                         |
+--------------------------------+
|[English, Math, Science, Spark] |
+--------------------------------+
|[English, History, Art]         |
+--------------------------------+

How can we replace English with ENGLISH in both rows?


Solution

  • Use a custom UDF to replace the word:

    val replace = udf{ x: Seq[String] => x.map(y => if(y == "English") "ENGLISH" else y) }
    
    val df2 = df.select(replace($"Subject").alias("Subject"))
    
    df2.show(false)
    +-------------------------------+
    |Subject                        |
    +-------------------------------+
    |[ENGLISH, Math, Science, Spark]|
    |[ENGLISH, History, Art]        |
    +-------------------------------+