I want to insert a symbol between two regex groups.
My code is as follows:
df = spark.createDataFrame([('ab',)], ['str'])
df = df.select(
concat(
regexp_extract('str', r'(\w)(\w)', 1), # extract the first group
lit(' '), # add symbol
regexp_extract('str', r'(\w)(\w)', 2) # add the second group
).alias('d')).collect()
print(df)
Is there any better way?
You can use regexp_replace
with capture groups:
import pyspark.sql.functions as F
df.select(F.regexp_replace('str', r'(\w)(\w)', '$1 $2').alias('d')).show()
+---+
| d|
+---+
|a b|
+---+