Having a column of strings with different lengths(e.g. "Apple" , "Pear" ,"cucumber" ,"watermelon" ), there are 27 letters in total. The aim is to randomly choose 10% , 20% , ....., 100% of these 27 letters and replace them by some random Ascii letters considering the length of the words. In a way that we choose more letters from longer words like "Watermelon" (10 letters) and less letters from shorter words like "Pear"(4 letters).
P.s:My goal is to simulate typos in a list of words and then apply Levenshtein Distance to find the best match with comparing the words with typos and the correct form of words.(e.g. converting "Apple" to "apfle" and then using LD to correct it to "Apple")
I'm not sure I understood correctly, but if I did, maybe you can try something like this:
import random
import string
letters = string.ascii_lowercase
output = []
for element in list_of_words:
element = list(element)
for n in range(len(element)):
if random.randint(0,9)==9:
element[n] = random.choice(letters)
output.append(str(element))
print (output)
This will iterate over each letter of each word from a list, and replace it with a random lowercase letter with a 10% probability. Then it will save the resulting words to a second list of words.