Search code examples
rapache-sparksparkr

Rename multiple columns at once in SparkR DataFrame


How can I rename multiple columns in a SparkR DataFrame at one time instead of calling withColumnRenamed() multiple time? For example, let's say I want to rename the columns in the DataFrame below to name and birthdays, how would I do so without calling withColumnRenamed() twice?

team <- data.frame(name = c("Thomas", "Bill", "George", "Randall"),
  surname = c("Johnson", "Clark", "Williams", "Yosimite"),
  dates = c('2017-01-05', '2017-02-23', '2017-03-16', '2017-04-08'))
team <- createDataFrame(team)

team <- withColumnRenamed(team, 'surname', 'name')
team <- withColumnRenamed(team, 'dates', 'birthdays')

Solution

  • Standard R methods apply here - you can simply reassign colnames:

    colnames(team) <- c("name", "name", "birthdays")
    team
    
    SparkDataFrame[name:string, name:string, birthdays:string]
    

    If you know the order you could skip full list and

    colnames(team)[colnames(team) %in% c("surname", "dates")] <- c("name", "birthdays")
    

    You'll probably want to to avoid duplicate names though.