Search code examples
scalaapache-sparkapache-spark-sql

Replacing whitespace in all column names in spark Dataframe


I have spark dataframe with whitespaces in some of column names, which has to be replaced with underscore.

I know a single column can be renamed using withColumnRenamed() in sparkSQL, but to rename 'n' number of columns, this function has to chained 'n' times (to my knowledge).

To automate this, i have tried:

val old_names = df.columns()        // contains array of old column names

val new_names = old_names.map { x => 
   if(x.contains(" ") == true) 
      x.replaceAll("\\s","_") 
   else x 
}                    // array of new column names with removed whitespace.

Now, how to replace df's header with new_names


Solution

  •   var newDf = df
      for(col <- df.columns){
        newDf = newDf.withColumnRenamed(col,col.replaceAll("\\s", "_"))
      }
    

    You can encapsulate it in some method so it won't be too much pollution.