Search code examples
rdplyrstringrstringdist

how to replace a dataframe with another dataframe in R


i want to replace a df1 data, with df2, which df2 is a data like df1 example

df1 <- data.frame(
  name = c(
    "A. MAHJUM-61365",
    "A. MAHJUM-61365. MAHJUM-61365",
    "A. RIZAL. AD-11002795",
    "A. RIZAL. AD-11002795. RIZAL. AD-11002795",
    "ABD. KADIR-60447",
    "ABD. KADIR-60447ABD. KADIR-60447",
    "ABD. KAHAR-62551",
    "ABD. RASYID DS-11002082",
    "ABDREAS APUNG @SANY",
    "ABDUL AZIS @HYUNDAY",
    "ABDUL AZIZ @HYUNDAI",
    "ABDUL AZIZ@HYUNDAI"
  ))

and df2 is

df2 <- data.frame(
  name = c(
    "A. MAHJUM-61365",
    "A. RIZAL. AD-11002795",
    "ABD. KADIR-60447",
    "ABD. KAHAR-62551",
    "ABD. RASYID DS-11002082",
    "ABDREAS APUNG @SANY",
    "ABDUL AZIS @HYUNDAY"
  ))

if df1 look like a df2, df1 would replaced to df2


Solution

  • As it is substring match, we can use fuzzyjoin

    library(dplyr)
    library(fuzzyjoin)
    regex_left_join(df1, df2, by = 'name') %>% 
      transmute(name = coalesce(name.y, name.x))
    

    or use a distance based approach

     stringdist_left_join(df1, df2, by = 'name') %>% 
       transmute(name = coalesce(name.y, name.x))