Search code examples
rdataframedplyrtibble

Organize columns with numerical info in colnames via dplyr relocate


I have a large amount of annual data in a data frame that will only get larger. I would like to organize it, grouping columns according to the year, which is included in the column names.

Base data:

dput(dat)
structure(list(id = 1:2, quantity = 3:4, avg_2002 = 5:6, avg_2003 = 7:8, 
    avg_2020 = 9:10, rev_2002 = c(15L, 24L), rev_2003 = c(21L, 
    32L), rev_2020 = c(27L, 40L)), row.names = c(NA, -2L), class = "data.frame")

What I would like to do is have all of the columns with, say, "2002" in them organized together, followed by the "2003" columns and so on...I know that relocate in dplyr is a good way to do it so I did the following:

dat <- tibble(dat)
dat <- dat %>%
  relocate(grep("2002$", colnames(dat), value = TRUE),
           .before = grep("2003$", colnames(dat), value = TRUE)) %>%
  relocate(grep("2003$", colnames(dat), value = TRUE),
           .after = grep("2002$", colnames(dat), value = TRUE))

which produces the desired result for my toy dataset:

     id quantity avg_2002 rev_2002 avg_2003 rev_2003 avg_2020 rev_2020
  <int>    <int>    <int>    <int>    <int>    <int>    <int>    <int>
1     1        3        5       15        7       21        9       27
2     2        4        6       24        8       32       10       40

My question is this:

  1. How do I generalize the code above so that I don't have to keep adding relocate statements ad nauseum?
  2. Is there a better way to do this task without using dplyr::relocate?

Any suggestions are much appreciated. Thanks!


Solution

  • We may use select - extract the numeric part of the column names, order it and use that index in select to reorder

    library(dplyr)
    dat %>% 
       select(id, quantity, order(readr::parse_number(names(.)[-(1:2)])) + 2)
    

    -output

    # A tibble: 2 × 8
         id quantity avg_2002 rev_2002 avg_2003 rev_2003 avg_2020 rev_2020
      <int>    <int>    <int>    <int>    <int>    <int>    <int>    <int>
    1     1        3        5       15        7       21        9       27
    2     2        4        6       24        8       32       10       40