I am receiving data like 7,432,818 (Imps) and need to load data in column having type decimal(20,3) I am trying to remove '(Imps)' but brackets are not getting removed using refexp_replace
I am using below code
validated_df=validated_df.withColumn('MeasurePer', F.regexp_replace('MeasurePer', ',', ''))
validated_df=validated_df.withColumn('MeasurePer', F.regexp_replace('MeasurePer', '(Imps)', ''))
Result getting as:
7432818 ()
I think all you need is escape characters before \(Imps\)
validated_df=validated_df.withColumn('MeasurePer', F.regexp_replace('MeasurePer', '\(Imps\)', ''))
(Or)
Try with this or(i.e.|)
condition in regular expressions.
df=spark.createDataFrame([('7,432,818 (Imps)',)],['dec'])
df=df.withColumn("dec",regexp_replace(col("dec"),"(,|\(Imps\))",""))
+--------+
| dec|
+--------+
|7432818 |
+--------+