apache-spark apache-spark-sql apache-spark-dataset

How to lower the case of column names of a data frame but not its values?

How to lower the case of column names of a data frame but not its values? using RAW Spark SQL and Dataframe methods ?

Input data frame (Imagine I have 100's of these columns in uppercase)

NAME | COUNTRY | SRC        | CITY       | DEBIT
---------------------------------------------
"foo"| "NZ"    | salary     | "Auckland" | 15.0
"bar"| "Aus"   | investment | "Melbourne"| 12.5

taget dataframe

name | country | src        | city       | debit
------------------------------------------------
"foo"| "NZ"    | salary     | "Auckland" | 15.0
"bar"| "Aus"   | investment | "Melbourne"| 12.5

Solution

for Java 8

Dataset<Row> input;
for (StructField field : input.schema().fields()) {
   String newName = field.name().toLowerCase(Locale.ROOT);
   input = input.withColumnRenamed(field.name(), newName);
   if (field.dataType() instanceof StructType) {
       StructType newStructType = (StructType) StructType.fromJson(field.dataType().json().toLowerCase(Locale.ROOT));
       input = input.withColumn(newName, col(newName).cast(newStructType));
   }
}