seach for substring with minimum characters match pandas

I have 1st dataFrame with column 'X' as :

2nd dataFrame with column 'Y' as :

Y
000A561044          
000A872220

I would like to match the part of substrings from both columns with minimum no. of characters(example 7 chars only alphanumeric to be considered for matching. all special chars to be excluded). so, my output DataFrame should be like this

X
A561044-2

Any possible solution would highly appreciate.

Thanks in advance.

Solution

IIUC and assuming that the first three values of Y start with 0, you can slice Y by [3:] to remove the first three zero values. Then, you can join these values by |. Finally, you can create your mask using contains that checks whether a series contains a specified value (in your case you would have something like 'A|B' and check whether a value contains 'A' or 'B'). Then, this mask can be used to query your other data frame.

Code:

import pandas as pd

df1 = pd.DataFrame({"X": ["A468593-3", "A697269-2", "A561044-2", "A239882 04"]})
df2 = pd.DataFrame({"Y": ["000A561044", "000A872220"]})

mask = df1["X"].str.contains(f'({"|".join(df2["Y"].str[3:])})')
df1.loc[mask]

Output:

    X
2   A561044-2

If you have values in Y that do not start with three zeros, you can use this function to reduce your columns and remove all first n zeros.

def remove_first_numerics(s):
  counter = 0
  while s[counter].isnumeric():
    counter +=1
  return s[counter:]

df_test = pd.DataFrame({"A": ["01Abd3Dc", "3Adv3Dc", "d31oVgZ", "10dZb1B", "CCcDx10"]})
df_test["A"].apply(lambda s: remove_first_numerics(s))

Output:

0     Abd3Dc
1     Adv3Dc
2    d31oVgZ
3      dZb1B
4    CCcDx10
Name: A, dtype: object