Search code examples
rdataframedplyrpivotdata-cleaning

How to solve error related to name repair startegy with double pivot_longer


I have been trying to convert separately columns as long format, by using the following code:

mtcars %>% 
  pivot_longer(1:3, values_to = 'values') %>% 
  pivot_longer(6:7, values_to = 'values1')

However, I am getting the following error

**Error in `pivot_longer()`:
! Names must be unique.
x These names are duplicated:
  * "name" at locations 7 and 9.
i Use argument `names_repair` to specify repair strategy.
Run `rlang::last_trace()` to see where the error occurred.**

can anyone know how to avoid this error and how to separately convert different chunks of columns as long format, possibly in the same pivot_longer call?

Thanks


Solution

  • The answer can be seen by looking at the documentation, and looking at the first part of your code.

    The documentation

    These are the arguments for pivot_longer(), along with their default arguments, if any:

    pivot_longer(
      data,
      cols,
      ...,
      cols_vary = "fastest",
      names_to = "name", <--- important part
      names_prefix = NULL,
      names_sep = NULL,
      names_pattern = NULL,
      names_ptypes = NULL,
      names_transform = NULL,
      names_repair = "check_unique",
      values_to = "value",
      values_drop_na = FALSE,
      values_ptypes = NULL,
      values_transform = NULL
    )
    

    As you can see, if you don't give a name for the name column (with the names_to argument), it defaults to the name "name".

    Your code

    Running mtcars %>% pivot_longer(1:3, values_to = 'values'), we get the output:

    # A tibble: 96 × 10
          hp  drat    wt  qsec    vs    am  gear  carb name  values
       <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>  <dbl>
     1   110  3.9   2.62  16.5     0     1     4     4 mpg     21  
     2   110  3.9   2.62  16.5     0     1     4     4 cyl      6  
     3   110  3.9   2.62  16.5     0     1     4     4 disp   160  
     4   110  3.9   2.88  17.0     0     1     4     4 mpg     21  
     5   110  3.9   2.88  17.0     0     1     4     4 cyl      6  
     6   110  3.9   2.88  17.0     0     1     4     4 disp   160  
     7    93  3.85  2.32  18.6     1     1     4     1 mpg     22.8
     8    93  3.85  2.32  18.6     1     1     4     1 cyl      4  
     9    93  3.85  2.32  18.6     1     1     4     1 disp   108  
    10   110  3.08  3.22  19.4     1     0     3     1 mpg     21.4
    # ℹ 86 more rows
    

    Notice the 'name' column on the end? When we try to run pivot_longer(6:7, values_to = 'values1') after this, we're trying to create a 'name' column again, but we already have one, so dplyr freaks out and throws this error.

    There's a few things we can do:

    1. assign names to the name columns (e.g. use names_to = "name1" for the first one, and names_to = "name2" for the second)
    2. add an option to names_repair: to quote the documentation -

    The default, "check_unique" is to error if the columns are duplicated. Use "minimal" to allow duplicates in the output, or "unique" to de-duplicated by adding numeric suffixes. See vctrs::vec_as_names() for more options.

    1. use names_to = NULL to drop the names column