Search code examples
rrenamereadr

How to replace the name_repair behavior of the readr package by numbering duplicates but not by their column position?


Suppose I have this csv file:

asdf,qwer,asdf,qwer,qwer
1,2,3,4,5

If I use readr::read_csv("some.csv") to read it I will obtain new column names for duplicates based on the position of the column.

# A tibble: 1 × 5
  asdf...1 qwer...2 asdf...3 qwer...4 qwer...5
     <dbl>    <dbl>    <dbl>    <dbl>    <dbl>
1        1        2        3        4        5

What could I do if I'd rather have names with suffixes based on the number of duplicates and with no modification for the first occurence like that:

# A tibble: 1 × 5
   asdf  qwer asdf_1 qwer_1 qwer_2
  <dbl> <dbl>  <dbl>  <dbl>  <dbl>
1     1     2      3      4      5

Hint

It seems possible to use the name_repair argument of read_csv and provide a function.


Solution

  • Since name_repair= can be a function, we can deal with it programmatically. Fortunately, base::make.unique does most of it, and we can customize it with sep="_" to get your exact output.

    namefun <- function(nm) make.unique(nm, sep = "_")
    txt <- 'asdf,qwer,asdf,qwer,qwer
    1,2,3,4,5'
    readr::read_csv(txt, name_repair = namefun)
    # Rows: 1 Columns: 5
    # ── Column specification ───────────────────────────────────────────────────────────────────────────────────────────
    # Delimiter: ","
    # dbl (5): asdf, qwer, asdf_1, qwer_1, qwer_2
    # ℹ Use `spec()` to retrieve the full column specification for this data.
    # ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
    # # A tibble: 1 × 5
    #    asdf  qwer asdf_1 qwer_1 qwer_2
    #   <dbl> <dbl>  <dbl>  <dbl>  <dbl>
    # 1     1     2      3      4      5