Search code examples
rformatstackmelt

Restructuring data set by stacking similar variables in R


I have the following variables as part of a bigger data set. And each 3 successive variables measured the same (e.g. the first 3 variables c_0064, c_0065, c_0066 measured 3 brands the participant knows, the second 3 variables v_159_1, v_159_2, v_159_3 measured the attitude the participant has on each of the before mentioned brands, and so on. I displayed just the first and last columns in the data set. After the column v_159_3 it continues actually with v_160_1, v_160_2, v_160_3, v_161_1... until it reaches the columns v_182_1, v_182_2, v_182_3.

structure(list(lfdn = c(4, 6, 7, 8, 9, 11, 12, 19), c_0064 = c("x", 
"t", "x", "x", "t", "x", "z", "z"), c_0065 = c("z", "z", "z", 
"f", "f", "f", "t", "t"), c_0066 = c("x", "x", "x", "a", "f", 
"t", "z", "b"), v_159_1 = c(1, 1, 3, 2, 2, 5, 4, 3), v_159_2 = c(3, 
3, 3, 3, 3, 2, 5, 1), v_159_3 = c(5, 5, 1, 4, 4, 1, 2, 2), v_182_1 = c(1, 
1, 5, 5, 4, 4, 4, 4), v_182_2 = c(4, 2, 2, 2, 2, 3, 1, 5), v_182_3 = c(5, 
4, 5, 1, 2, 5, 2, 2)), row.names = c(NA, -8L), class = c("tbl_df", 
"tbl", "data.frame"))


> df
# A tibble: 8 x 10
   lfdn c_0064 c_0065 c_0066 v_159_1 v_159_2 v_159_3 v_182_1 v_182_2 v_182_3
  <dbl> <chr>  <chr>  <chr>    <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>
1     4 x      z      x            1       3       5       1       4       5
2     6 t      z      x            1       3       5       1       2       4
3     7 x      z      x            3       3       1       5       2       5
4     8 x      f      a            2       3       4       5       2       1
5     9 t      f      f            2       3       4       4       2       2
6    11 x      f      t            5       2       1       4       3       5
7    12 z      t      z            4       5       2       4       1       2
8    19 z      t      b            3       1       2       4       5       2

And the goal is to restructure/stack always the 3 similar columns like the following:

structure(list(lfdn = c(4, 6, 7, 8, 9, 11, 12, 19, 4, 6, 7, 8, 
9, 11, 12, 19, 4, 6, 7, 8, 9, 11, 12, 19), c_0064_65_66 = c("x", 
"t", "x", "x", "t", "x", "z", "z", "z", "z", "z", "f", "f", "f", 
"t", "t", "x", "x", "x", "a", "f", "t", "z", "b"), v_159_1_2_3 = c(1, 
1, 3, 2, 2, 5, 4, 3, 3, 3, 3, 3, 3, 2, 5, 1, 5, 5, 1, 4, 4, 1, 
2, 2), v_181_1_2_3 = c(1, 1, 5, 5, 4, 4, 4, 4, 4, 2, 2, 2, 2, 
3, 1, 5, 5, 4, 5, 1, 2, 5, 2, 2)), row.names = c(NA, -24L), class = c("tbl_df", 
"tbl", "data.frame"))

> dflong
# A tibble: 24 x 4
    lfdn c_0064_65_66 v_159_1_2_3 v_181_1_2_3
   <dbl> <chr>              <dbl>       <dbl>
 1     4 x                      1           1
 2     6 t                      1           1
 3     7 x                      3           5
 4     8 x                      2           5
 5     9 t                      2           4
 6    11 x                      5           4
 7    12 z                      4           4
 8    19 z                      3           4
 9     4 z                      3           4
10     6 z                      3           2
# ... with 14 more rows

I failed already to melt the data, so the only procedure that came to my mind was to use stack command and stack every following 3 variables like stack(df, select=c("c_0064", "c_0065", "c_0066")) and then put these stacked variables together in the end. But I hope there is a much more economical way to do it since I have much more "repeating" variables in the data set beside the displayed ones.


Solution

  • You can use pivot_longer with names_pattern. Based on the column names in your data use the pattern to accurately capture the column names.

    tidyr::pivot_longer(df, cols = -lfdn, 
                        names_to = '.value', names_pattern = '(c|[a-z]_\\d+)')
    
    #    lfdn c     v_159 v_182
    #   <dbl> <chr> <dbl> <dbl>
    # 1     4 x         1     1
    # 2     4 z         3     4
    # 3     4 x         5     5
    # 4     6 t         1     1
    # 5     6 z         3     2
    # 6     6 x         5     4
    # 7     7 x         3     5
    # 8     7 z         3     2
    # 9     7 x         1     5
    #10     8 x         2     5
    # … with 14 more rows