Search code examples
rdplyrduplicatestidyrdata-cleaning

Changing Duplicate Values Within Subjects: R


My data looks like this:

Country GDP Year
A 10 1972
A 15 1973
A 20 1973
A 18 1975
B 25 1950
B 30 1951
B 35 1951
B 36 1953

I have so many observations look like data that I presented above. I want to change the duplicated years. However, I want to change first duplicated row of the year variable. I want to see my data like this:

Country GDP Year
A 10 1972
A 20 1973
A 15 1974
A 18 1975
B 25 1950
B 35 1951
B 30 1952
B 36 1953

Thank you for your time!


Solution

  • Here is one possible option with tidyverse:

    library(tidyverse)
    
    df %>% 
      group_by(Country, Year) %>%
      mutate(dup = case_when(n() == 1 ~ FALSE,
                             min(GDP) == GDP ~ TRUE,
                             TRUE ~ FALSE)) %>% 
      mutate(Year = ifelse(dup == TRUE, Year + 1, Year)) %>% 
      arrange(Country, Year) %>% 
      ungroup %>% 
      select(-dup)
    

    Output

      Country   GDP  Year
      <chr>   <int> <dbl>
    1 A          10  1972
    2 A          20  1973
    3 A          15  1974
    4 A          18  1975
    5 B          25  1950
    6 B          35  1951
    7 B          30  1952
    8 B          36  1953