Search code examples
rdataframemergetidyversepurrr

Change purrr reduce column naming to numeric?


When using purrr::reduce() to left join a large list of data frames the default is to names the columns with .x, .y, .x.x, .y.y etc..

Is it possible to change this behaviour to obtain column names with numeric endings? e.g.

.1, .2, .3, .4 ?

# test data
data.frame(id= rep(letters[1:5],3,20), visits = rep(1:4,each=5), data =rep(1:5,3,20)) %>%
  # split df into a list for each visit
  group_split(visits) %>%
  # left join each nhs_joint_side
  reduce(left_join, by = c("id"))


# A tibble: 5 x 9
  id    visits.x data.x visits.y data.y visits.x.x data.x.x visits.y.y data.y.y
  <chr>    <int>  <int>    <int>  <int>      <int>    <int>      <int>    <int>
1 a            1      1        2      1          3        1          4        1
2 b            1      2        2      2          3        2          4        2
3 c            1      3        2      3          3        3          4        3
4 d            1      4        2      4          3        4          4        4
5 e            1      5        2      5          3        5          4        5

# ideal data
# A tibble: 5 x 9
  id    visits_1  data_1  visits_2  data_2 visits_3    data_3    visits_4  data_4
  <chr>    <int>  <int>    <int>  <int>      <int>    <int>      <int>    <int>
1 a            1      1        2      1          3        1          4        1
2 b            1      2        2      2          3        2          4        2
3 c            1      3        2      3          3        3          4        3
4 d            1      4        2      4          3        4          4        4
5 e            1      5        2      5          3        5          4        5

Solution

  • You can rename the columns first, and then merge:

    df %>%
      group_split(visits) %>%
      imap(function(x, y) x %>% 
             rename_with(~paste(., y, sep = '_'), -id)) %>%
      reduce(left_join, by = 'id')
    

    output

      id    visits_1 data_1 visits_2 data_2 visits_3 data_3 visits_4 data_4
      <chr>    <int>  <int>    <int>  <int>    <int>  <int>    <int>  <int>
    1 a            1      1        2      1        3      1        4      1
    2 b            1      2        2      2        3      2        4      2
    3 c            1      3        2      3        3      3        4      3
    4 d            1      4        2      4        3      4        4      4
    5 e            1      5        2      5        3      5        4      5
    

    data

    df <- data.frame(id= rep(letters[1:5],3,20), visits = rep(1:4,each=5), data =rep(1:5,3,20))