Search code examples
pythonlevenshtein-distance

Replace values ​in a column with similar values ​in another column with different size - Python


I have a dataframe with different values ​​in a column (about 6,000 rows), which I need to replace with similar (but differents) values ​​found in another dataframe, which has fewer rows.

Store Values to replace
Store A 05/15/21 Store A
The Store B 04/01/21 Store B
Store letter B 11/12/21 Store C
Store C 10/24/21 Store D
Store D 09/30/21
the Store C 05/13/21
Store letter D 07/01/21
Store letter A 08/29/21

The goal is for the final dataframe to look like this:

Store
Store A
Store B
Store B
Store C
Store D
Store C
Store D
Store A

Solution

  • Given df and df2:

                         Store
    0         Store A 05/15/21
    1     The Store B 04/01/21
    2  Store letter B 11/12/21
    3         Store C 10/24/21
    4         Store D 09/30/21
    5     the Store C 05/13/21
    6  Store letter D 07/01/21
    7  Store letter A 08/29/21
    
      replace_values
    0        Store A
    1        Store B
    2        Store C
    3        Store D
    

    Doing:

    df.Store = df.Store.str.split()
    df2.replace_values = df2.replace_values.str.split()
    df.Store = df.Store.apply(lambda row: ' '.join(x for x in row if any(x in i for i in df2.replace_values)))
    print(df)
    

    Output:

         Store
    0  Store A
    1  Store B
    2  Store B
    3  Store C
    4  Store D
    5  Store C
    6  Store D
    7  Store A