Search a dataframe from a list and add column to say found or not

This is my df with 2 columns:

utid  | description
------+-------------------------------------
12342 | my name is 123 amrud and nitesh
 2345 | my name is anil
 2122 | my name is 1234 mohan

and a list like list {"mohan","nitesh"}.

I need to search if an element from this list is present in the description column. If yes, then print "found" else print "not found" in a different column of the dataframe.

The list is far bigger than this of around 20k elements.

The output dataframe should be like this:

utid  | description                     | foundornot
------+---------------------------------+-----------
12342 | my name is 123 amrud and nitesh | found
 2345 | my name is xyz                  | not found
 2122 | my name is 1234 mohan           | found

Any help is welcome

Solution

You can simply define a udf function check for the condition and return on of the found or not found strings

val list = List("mohan","nitesh")

import org.apache.spark.sql.functions._
def checkUdf = udf((strCol: String) => if (list.exists(strCol.contains)) "found" else "not found")

df.withColumn("foundornot", checkUdf(col("description"))).show(false)

Thats it and you should be getting

+-----+-------------------------------+----------+
|utid |description                    |foundornot|
+-----+-------------------------------+----------+
|12342|my name is 123 amrud and nitesh|found     |
|2345 |my name is anil                |not found |
|2122 |my name is 1234 mohan          |found     |
+-----+-------------------------------+----------+

I hope the answer is helpful