Search code examples
rdataframedplyrdata.tabletidyverse

Reshape long dataframe to wide and rename new columns by using one column as prefix


Given a dataframe df as follows:

df <- structure(list(code = c("M0000273", "M0000357", "M0000545", "M0000273", 
"M0000357", "M0000545"), name = c("industry", "agriculture", 
"service", "industry", "agriculture", "service"), act_value = c(16.78, 
9.26, 49.38, 35.74, 88.42, 68.26), pred_value = c(17.78, 10.26, 
50.38, 36.74, 89.42, 69.26), year = c(2019L, 2019L, 2019L, 2020L, 
2020L, 2020L)), class = "data.frame", row.names = c(NA, -6L))

df:

      code        name act_value pred_value year
1 M0000273    industry     16.78      17.78 2019
2 M0000357 agriculture      9.26      10.26 2019
3 M0000545     service     49.38      50.38 2019
4 M0000273    industry     35.74      36.74 2020
5 M0000357 agriculture     88.42      89.42 2020
6 M0000545     service     68.26      69.26 2020

I would like to use code and name as index columns, and convert act_value and pred_value from long to wide, and finally rename new columns by paste year column as prefix.

The expected result will like to the format as follows:

      code        name  2019_act_value  2019_pred_value  2020_act_value  2020_pred_value
1 M0000273    industry           16.78            17.78           35.74            36.74
2 M0000357 agriculture            9.26            10.26           88.42            89.42
3 M0000545     service           49.38            50.38           68.26            69.26

My trial code:

reshape(df, idvar = c('code', 'name'), timevar = 'year', direction = 'wide')

How could I achieve that correctly using R? Thanks.


Solution

  • We can use tidyr::pivot_wider to do this. I wouldn't recommend your naming convention, and if you drop names_glue we get the same result but with the tidier year as suffix format instead.

    library(tidyr)
    
    pivot_wider(df,
                names_from = year,
                names_glue = "{year}_{.value}",
                values_from = ends_with("value"))
    #> # A tibble: 3 × 6
    #>   code     name        `2019_act_value` `2020_act_value` `2019_pred_value`
    #>   <chr>    <chr>                  <dbl>            <dbl>             <dbl>
    #> 1 M0000273 industry               16.8              35.7              17.8
    #> 2 M0000357 agriculture             9.26             88.4              10.3
    #> 3 M0000545 service                49.4              68.3              50.4
    #> # … with 1 more variable: 2020_pred_value <dbl>