Search code examples
rdplyrtidyselect

Failure to rename duplicate column names with dplyr rename() and rename_with()


Given a tibble or data.frame that has duplicate column names, I want to use dplyr::rename or dplyr::rename_with to either:
(a) differentiate the duplicate names with a serial numeric suffix ('a_1', 'a_2', etc) or
(b) rename each column entirely.

Have:

library(tidyverse)

d <- tibble(a = 1:3, a = letters[1:3], .name_repair = "minimal")

d
# A tibble: 3 x 2
      a a    
  <int> <chr>
1     1 a    
2     2 b    
3     3 c  

Want:

tibble(a_1 = 1:3, a_2 = letters[1:3])

# A tibble: 3 x 2
    a_1 a_2           # or even just: x, y    
  <int> <chr>
1     1 a    
2     2 b    
3     3 c   

It seems reasonable to expect that rename/rename_with could perform this operation, especially since colnames(d) <- c("a_1", "a_2") seems obvious and uncomplicated.

But I've gotten different errors and/or unexpected behavior with the three approaches I've tried so far:

1. Using rename(), one column is renamed, the other is not:

d %>% rename(x = "a", y = "a")  

# A tibble: 3 x 2
        y a    
    <int> <chr>
  1     1 a    
  2     2 b    
  3     3 c   

2. Using rename_with(), in which I use a function to add a numeric suffix, I get this error:

d %>% rename_with(~paste(.x, 1:2, sep = "_"))

Error: Names must be unique.
x These names are duplicated:
  * "a" at locations 1 and 2.

Note that this use of rename_with works as expected when data has no duplicate column names:

no_dupe <- tibble(a = 1:3, b = letters[1:3])
no_dupe %>% rename_with(~paste(.x, 1:2, sep = "_"))

# A tibble: 3 x 2
    a_1 b_2  
  <int> <chr>
1     1 a    
2     2 b    
3     3 c    

3. Using the older rename_all(), I get a different error:

d %>% rename_all(paste0, 1:2)

Error: Can't rename duplicate variables to `{name}`.

I found a discussion about handling duplicates with rename in the tidyselect GitHub issues, but that was about what to do if a user creates duplicate column names with rename(), not what to do if they are trying to unduplicate.

Am I missing syntax, or is the dplyr::rename family just not set up to ingest duplicate colnames?

(I'd also like to better understand why rename only renames one column in the example above, but that's less functional and more just me being curious.)

Thanks in advance.


Solution

  • rename does not know about the previous change in name done. For example,

    library(dplyr)
    mtcars %>% rename(a = mpg, b = a)
    

    Error: Can't rename columns that don't exist. x Column a doesn't exist.

    So this

    d %>% rename(x = "a", y = "a")  
    

    renames the same a column twice, first with x and then with y. An alternative is to break the pipe and rename.

    d %>% rename(x = "a") %>% rename(y = "a")  
    
    # A tibble: 3 x 2
    #      x y    
    #  <int> <chr>
    #1     1 a    
    #2     2 b    
    #3     3 c 
    

    rename_with does not allow to rename dataframe/tibbles with duplicate columns because of https://github.com/tidyverse/dplyr/blob/master/R/rename.R#L70 . In such case I think your best bet is to use base R methods.