Search code examples
apache-sparkapache-spark-sqlapache-spark-dataset

How to lower the case of column names of a data frame but not its values?


How to lower the case of column names of a data frame but not its values? using RAW Spark SQL and Dataframe methods ?

Input data frame (Imagine I have 100's of these columns in uppercase)

NAME | COUNTRY | SRC        | CITY       | DEBIT
---------------------------------------------
"foo"| "NZ"    | salary     | "Auckland" | 15.0
"bar"| "Aus"   | investment | "Melbourne"| 12.5

taget dataframe

name | country | src        | city       | debit
------------------------------------------------
"foo"| "NZ"    | salary     | "Auckland" | 15.0
"bar"| "Aus"   | investment | "Melbourne"| 12.5

Solution

  • for Java 8

    Dataset<Row> input;
    for (StructField field : input.schema().fields()) {
       String newName = field.name().toLowerCase(Locale.ROOT);
       input = input.withColumnRenamed(field.name(), newName);
       if (field.dataType() instanceof StructType) {
           StructType newStructType = (StructType) StructType.fromJson(field.dataType().json().toLowerCase(Locale.ROOT));
           input = input.withColumn(newName, col(newName).cast(newStructType));
       }
    }