Search code examples
rdplyraggregateplyr

Replace duplicate elements in a column in R


I have a data.frame that looks like this -

columnA=c(1,2,3,1.1,2.2,3.3,1,2)
columnB=c("a","b","c","d","e","f","g","h")

data=data.frame(columnA, columnB)

  columnA columnB
1     1.0       a
2     2.0       b
3     3.0       c
4     1.1       d
5     2.2       e
6     3.3       f
7     1.0       g
8     2.0       h

I would like to find the duplicates in column A and replace them with the elements from the same row in column B. I want column C to be like this

  columnA columnB  columnC
1     1.0       a    1.0
2     2.0       b    2.0
3     3.0       c    3.0
4     1.1       d    1.1
5     2.2       e    2.2
6     3.3       f    3.3
7     1.0       g     g
8     2.0       h     h

where the duplicates 1.0 and 3.0 in rows 7 & 8 of column A have been replaced with the corresponding elements in rows 7 & 8 of column B [g and h]

Any help would be highly appreciated. Struggling for a long time with this.


Solution

  • Here is another option. Group by columnA and if we see the first occurrence of A then use A else use B.

    library(tidyverse)
    
    data <- tibble(columnA = c(1,2,3,1.1,2.2,3.3,1,2), 
                   columnB =c("a","b","c","d","e","f","g","h"))
    
    data %>%
      group_by(columnA) %>%
      mutate(columnC = ifelse(row_number() == 1, as.character(columnA), columnB))
    #> # A tibble: 8 x 3
    #> # Groups:   columnA [6]
    #>   columnA columnB columnC
    #>     <dbl> <chr>   <chr>  
    #> 1     1   a       1      
    #> 2     2   b       2      
    #> 3     3   c       3      
    #> 4     1.1 d       1.1    
    #> 5     2.2 e       2.2    
    #> 6     3.3 f       3.3    
    #> 7     1   g       g      
    #> 8     2   h       h