Search code examples
pysparkapache-spark-sqldatabricks

Remove & replace characters using PySpark


I have a dataframe and would like to remove all the brackets and replace with two hyphens.

Before:

+------------+
|  dob_concat|
+------------+
|[1983][6][3]|
+------------+

After:

+------------+
| dob_concat |
+------------+
| 1983-6-3   |
+------------+

Solution

  • you can use regexp_replace inbuilt function as below.

    from pyspark.sql import functions as F
    df.withColumn("dob_concat", F.regexp_replace(F.regexp_replace(F.regexp_replace("dob_concat", "\\]\\[", "-"), "\\[", ""), "\\]", "")).show()