I'm trying to replace a escape character with NULL in pyspark dataframe. Data in dataframe looks like below
Col1|Col2|Col3
1|\026\026|026|abcd026efg.
Col2 is a garbage data and trying to replace with NULL. Tried replace and regex_replace functions to replace '\026' with Null value, because of escape character (" \ "), data is not replaced with NULL value.
replace(col2, "026", 'abcd')
replace(Col2, "\026", 'abcd')
Finally,
I want my data as
Col1|Col2|Col3
1|NULL|026|abcd026efg.
Highly appreciate for thoughts to resolve this scenario.
Thanks -EVR
Use replace all digits and preceding non digits
import pyspark.sql.functions as F
df.withColumn('col2',F.regexp_replace('col2','\D\d+',None)).show()
+----+----+-----------+
|col1|col2| col3|
+----+----+-----------+
| 1|null|abcd026efg.|
+----+----+-----------+