How to handle accented letter in Pyspark

I have a pyspark dataframe in which I need to add "translate" for a column. I have the below code

df1 = df.withColumn("Description", F.split(F.trim(F.regexp_replace(F.regexp_replace(F.lower(F.col("Short_Description")), \
        r"[/\[/\]/\{}!-]", ' '), ' +', ' ')), ' '))\
        
df2 = df1.withColumn("Description", F.translate('Description', 'ãäöüẞáäčďéěíĺľňóôŕšťúůýžÄÖÜẞÁÄČĎÉĚÍĹĽŇÓÔŔŠŤÚŮÝŽ',
                                       'aaousaacdeeillnoorstuuyzAOUSAACDEEILLNOORSTUUYZ'))
                                       
df3 = df2.withColumn('Description', F.explode(F.col('Description')))

I'm getting datatype mismatch error: argument 1 requires string type, 'Description' is of array<string> type

I need to handle the accented letters in Description column.

Please let me know how to solve this

Solution

Try using spark higher order functions transform to iterate through array and replace.

Example:

from pyspark.sql.functions import *

df= spark.createDataFrame([(1,['123a','2431abc'])],['id','description'])

df.withColumn("description",expr("""transform(description,x -> translate(x,'abc',''))""")).display()

#result:
#+---+-----------+
#| id|description|
#+---+-----------+
#|  1|[123, 2431]|
#+---+-----------+