Search code examples
rlistsortingdplyr

Controlling the ordering of the output of dplyr::group_split


I have split up a dataframe into a list of sub-dataframes (sub_dfs) using dplyr::group_split. These sub_dfs``` contain data which I then create a separate plot for each. However, I want to create these plots in a specific order.

Every sub_df has a column called 'rank'. I want to create my plots in ascending order of:

(sub_df$rank)[1]

How can I achieve this?

I have tried to find an argument of dplyr::group_split that allows me to control the ordering of the list it outputs. But I have had no luck.

I have also tried to concisely re-order the list after its creation. I can extract the relevant ranks from the list by:

  extract_rank <- function(sub_df){as.numeric(return(sub_df[1,'rank'])}
  ranking <- lapply(list, extract_rank)

...and ranking = c(10, 8, 3, 4, 6, 1...) tells me that the first sub_df in list should actually be in the 10th position etc.

Can I somehow use ranking to re-order my list of sub_dfs?


Solution

  • Make a factor variable that you can split by and has its levels in the order you want. Here's a simple example with mtcars:

    mtcars |>
      summarize(mpg = mean(mpg), .by = cyl) |> 
      mutate(cyl_split = factor(cyl, levels = c(4, 8, 6))) |> 
      group_split(cyl_split)
    # <list_of<
    #   tbl_df<
    #     cyl      : double
    #     mpg      : double
    #     cyl_split: factor<37885>
    #   >
    # [3]>
    # [[1]]
    # # A tibble: 1 × 3
    #     cyl   mpg cyl_split
    #   <dbl> <dbl> <fct>    
    # 1     4  26.7 4        
    # 
    # [[2]]
    # # A tibble: 1 × 3
    #     cyl   mpg cyl_split
    #   <dbl> <dbl> <fct>    
    # 1     8  15.1 8        
    # 
    # [[3]]
    # # A tibble: 1 × 3
    #     cyl   mpg cyl_split
    #   <dbl> <dbl> <fct>    
    # 1     6  19.7 6       
    

    You can order the factor levels manually, or use a function like reorder to order them based on a function of another column.

    Note that it is not sufficient to order the rows of the data frame with arrange, you need to order the levels of the factor column you split by so that they will be in the order you want when they are sorted.