Search code examples
pythonfuzzy-searchfuzzy-comparison

Fuzzy match ranking


I fuzzy matched a list of movie titles and compiled them into another list of each comparison along with the match values:

>>> fuzzy_matches
[(['White Warrior (Alpha Video)'], ['White Warrior (Alpha Video)'], 100), (['White Warrior (Alpha Video)'], ['White Warrior (Digiview Entertainment)'], 63), (['White Warrior (Alpha Video)'], ['White Warrior (Platinum)'], 78), (['White Warrior (Alpha Video)'], ['White Warrior (Platinum) / David And Goliath'], 63), (['White Warrior (Alpha Video)'], ['White Warrior (Platinum) / Duel Of Champions'], 61)]...etc

I want to add up the match values for each title so that I get output like this:

>>>([White Warrior (Alpha Video)], 248),
['White Warrior 2 (Digiview Entertainment)'], 390),
etc...

I have tried several implementations utilizing slices but it's ugly.

(Not my exact code but this is the ugliness):

for x in range(len(fuzzed)):
    for y in fuzzed(len(fuzzed)):

big_dict[fuzzy_matches[55][0][0]]=fuzzy_matches[55][2] + fuzzy_matches[56][3]...

what is a more efficient way to accomplish this?


Solution

  • You can use a dict to store the results you want , and then at the end if you want a list of tuples , you can use dict.items() (Python 3.x ) to get that.

    Example -

    >>> fuzzy_matches = [(['White Warrior (Alpha Video)'], ['White Warrior (Alpha Video)'], 100), (['White Warrior (Alpha Video)'], ['White Warrior (Digiview Entertainment)'], 63), (['White Warrior (Alpha Video)'], ['White Warrior (Platinum)'], 78), (['White Warrior (Alpha Video)'], ['White Warrior (Platinum) / David And Goliath'], 63), (['White Warrior (Alpha Video)'], ['White Warrior (Platinum) / Du
    el Of Champions'], 61)]
    >>>
    >>> fuzzy_dict = {}
    >>> for i in fuzzy_matches:
    ...     if i[0][0] not in fuzzy_dict:
    ...             fuzzy_dict[i[0][0]] = 0
    ...     fuzzy_dict[i[0][0]] += i[2]
    ...
    >>> fuzzy_dict
    {'White Warrior (Alpha Video)': 365}
    >>> list(fuzzy_dict.items())
    [('White Warrior (Alpha Video)', 365)]
    

    You do not need list(...) at the end if you are using Python 2.x .