Search code examples
rsortingdataframevectorvariable-names

Sort a dataframe by columns with their names passed as vector


I need to sort multiple dataframes by a list of columns with non-alphabetic characters in their names. For a single dataset I'd use this famous solution with a workaround for blanks and stuff in the variable name:

df_sorted = df[with(df, order(varname, xtfrm(df[,"varname with blanks and\slashes"]) ) ), ]

But for multiple datasets it's more suitable to have a function with a vector of column names as an input:

sort_by_columns = function(col_names){...}
df_sorted = sort_by_columns(col_names = c("varname","varname with blanks and\slashes"))

How do I transform a vector into an argument suitable for order() inside my function?


Solution

  • Without an example data set for your problem, I'll use the iris data as an example. Using dplyr and tidyeval would be my approach to this.

    library(dplyr)    
    library(datasets)
    data(iris)
    
    # I'll rename one of the columns so that it has a space and a slash (slashes will 
    # need to be escaped to appear in column name
    iris <- iris %>%
        rename('sepal \\length' = 'Sepal.Length')
    
    # Data will be sorted in the order listed
    col_names <- c('sepal \\length', 'Sepal.Width')
    
    data_sorted <- iris %>%
        arrange(!!!syms(col_names))
    

    To turn this into a function:

    sort_by_columns <- function(data, col_names){
      data_sorted <- data %>%
          arrange(!!!syms(col_names))
    
      return(data_sorted)
    }