Search code examples
pysparkapache-spark-sqlspecial-charactersazure-databricks

Select spark dataframe column with special character in it using selectExpr


I am in a scenario where my columns name is Município with accent on the letter í.

My selectExpr command is failing because of it. Is there a way to fix it? Basically I have something like the following expression:

.selectExpr("...CAST (Município as string) as Município...")

What I really want is to be able to leave the column with the same name that it came, so in the future, I won't have this kind of problem on different tables/files.

How can I make spark dataframe accept accents or other special characters?


Solution

  • You can use wrap your column name in backticks. For example, if you had the following schema:

    df.printSchema()
    #root
    # |-- Município: long (nullable = true)
    

    Express the column name with the special character wrapped with the backtick:

    df2 = df.selectExpr("CAST (`Município` as string) as `Município`")
    df2.printSchema()
    #root
    # |-- Município: string (nullable = true)