Search code examples
rdplyrplyr

How to pivot_longer a set of multiple columns? and How to go back from that long format to original wide?


If I have the following data:

D = tibble::tribble(
  ~firm, ~ind, ~var1_1, ~var1_2, ~op2_1, ~op2_2,
  "A",     1,     10,     11,     11,     12,
  "A",     2,     12,     13,     13,     14,
  "B",     1,     14,     15,     15,     16,
  "B",     2,     16,     17,     17,     18,
  "C",     1,     18,     19,     19,     20,
  "C",     2,     20,     21,     21,     22,
)

How can I pivot_longer() var1 and var2 having "_*" as year indicator?

I mean, I would like have something like this:

D %>%
  pivot_longer(var1_1:op2_2,
  names_to = c(".value", "year"),
  names_pattern = "(.*)_(.*)",
  values_to = c("var1, var2")
  )
# A tibble: 12 x 5
   firm    ind year   var1  op2
   <chr> <dbl> <chr> <dbl> <dbl>
 1 A         1 1        10    11
 2 A         1 2        11    12
 3 A         2 1        12    13
 4 A         2 2        13    14
 5 B         1 1        14    15
 6 B         1 2        15    16
 7 B         2 1        16    17
 8 B         2 2        17    18
 9 C         1 1        18    19
10 C         1 2        19    20
11 C         2 1        20    21
12 C         2 2        21    22

I'm achieving the desired result using the code above. However in my real case I'm dealing with more than 30 variables and 10 years. Then, using values_to isn't practical and clean. I'd like the code read first part of variable name as the desired new variable name. Since initially all columns to be pivoted are structured like "varname_year".

Besides, once I get the new data format into long, I might need to go back to wide-format keeping the initial data structure.


Solution

  • We can use one of the select_helpers

    library(dplyr)
    library(tidyr)
    library(stringr)
    Dlong <- D %>%
              pivot_longer(cols = starts_with('var'), 
                 names_to = c(".value", "year"), names_sep = "_")
    

    From the 'long' format, change to 'wide' with pivot_wider

    Dlong %>%
        pivot_wider(names_from = ind, values_from = str_c("var", 1:2))