I am new to scala and struggling with these use case. How can I remove the elements part of a list from a column in a dataframe?
I have a list of names and I need to remove the names if it is present in the dataframe.
I have dataframe like
utid|description
12342|my name is daniel
2345|my name is harry and i love sports
2122|his wife sofia is my schoolmate
and a list
list { "harry", "daniel" }
The output should be like
utid|description
12342|my name is
2345|my name is and i love sports
2122|his wife sofia is my schoolmate
Simplest way is to use regexp_replace
inbuilt function as
val list = List("harry","daniel")
import org.apache.spark.sql.functions._
df.withColumn("description", regexp_replace(col("description"), list.mkString("(", ")|(", ")"), "")).show(false)
which should give you
+-----+-------------------------------+
|utid |description |
+-----+-------------------------------+
|12342|my name is |
|2345 |my name is and i love sports |
|2122 |his wife sofia is my schoolmate|
+-----+-------------------------------+
I hope the answer is helpful