Search code examples
pythonpandasfuzzywuzzy

From indices to field name in pandas dataframe


I would need to get back the value names from indices. My dataset is as follows

try_test = pd.DataFrame({'word': ['apple', 'orange', 'diet', 'energy', 'fire', 'cake'], 
                         'name': ['dog', 'cat', 'mad cat', 'good dog', 'bad dog', 'chicken']})

    word    name
0   apple   dog
1   orange  cat
2   diet    mad cat
3   energy  good dog
4   fire    bad dog
5   cake    chicken

Using this function:

def func(name):
    matches = try_test.apply(lambda row: (fuzz.partial_ratio(row['name'], name) >= 85), axis=1)
    return [i for i, x in enumerate(matches) if x]

try_test.apply(lambda row: func(row['name']), axis=1)

I got the following values:

0    [0, 3, 4]
1       [1, 2]
2       [1, 2]
3       [0, 3]
4       [0, 4]
5          [5]

I would like to have the word fields instead of the indices.

Expected output:

0    [apple, energy, fire]
1       [orange, diet]
2       [orange, diet]
3       [apple, energy]
4       [apple, fire]
5          [cake]

Any suggestions will be greatly appreciate.


Solution

  • Change your function from i to try_test.word[i]

    def func(name):
        matches = try_test.apply(lambda row: (fuzz.partial_ratio(row['name'], name) >= 85), axis=1)
        return [try_test.word[i] for i, x in enumerate(matches) if x]