Search code examples
pythonalgorithmmachine-learningpysparkfuzzy-logic

develop a python/pyspark program to display similar kinds of words


[code_image....

it should print similar output in one col ]1>

from fuzzywuzzy import fuzz
from fuzzywuzzy import process
query = "Apple"
#set of DATA 25 records
choices = ["apil",
    "apple",
    "Apille",
    "aple",
    "apil",
    "appple",
    "Apple APPLE",
    "Apil Orange",
    "apples"
]
process.extract(query, choices)
#### Printing Accuracy Value
print ("List of ratios: ")
print (process.extract(query, choices), "\n")
#process.extractone(query, choices)
print ("\nBest among the above list ----->",process.extractOne(query, choices))

Output:

List of ratios:

[('apple', 100), ('appple', 91), ('apples', 91), ('Apple APPLE', 90), ('aple', 89)]

Best among the above list -----> ('apple', 100)


Solution

  • I only had to change one line of and add another one to your snippet. You can find comments where I applied those changes, which explain what they do. I wasn't sure about the exact output format you wanted, so feel free to ask again if it's not what you wanted.

    Take a look at list comprehension if you want to dig deeper into how the last line works.

    from fuzzywuzzy import fuzz
    from fuzzywuzzy import process
    query = "Apple"
    #set of DATA 25 records
    choices = ["apil",
        "apple",
        "Apille",
        "aple",
        "apil",
        "appple",
        "Apple APPLE",
        "Apil Orange",
        "apples"
    ]
    # 1st change here
    # The next line stores tuples of each choice and it's according similarity measure in a list. This entries seem to be ordered from what your snippet shows.
    ordered_choices = process.extract(query, choices)
    #### Printing Accuracy Value
    print ("List of ratios: ")
    print (process.extract(query, choices), "\n")
    #process.extractone(query, choices)
    print ("\nBest among the above list ----->",process.extractOne(query, choices))
    
    # 2nd change here
    # The following line takes the first element of each tuple in the list and adds is to another list, which is afterwards printed. 
    print("\nOrdered choices: ", [choice for choice, value in ordered_choices])