pyspark apache-spark-sql special-characters azure-databricks

Select spark dataframe column with special character in it using selectExpr

I am in a scenario where my columns name is Município with accent on the letter í.

My selectExpr command is failing because of it. Is there a way to fix it? Basically I have something like the following expression:

.selectExpr("...CAST (Município as string) as Município...")

What I really want is to be able to leave the column with the same name that it came, so in the future, I won't have this kind of problem on different tables/files.

How can I make spark dataframe accept accents or other special characters?

Solution

You can use wrap your column name in backticks. For example, if you had the following schema:

df.printSchema()
#root
# |-- Município: long (nullable = true)

Express the column name with the special character wrapped with the backtick:

df2 = df.selectExpr("CAST (`Município` as string) as `Município`")
df2.printSchema()
#root
# |-- Município: string (nullable = true)