I have a list of words, and with python3 I count the difference in letters between each combination of words (using a clever diff_summing algorithm from this site):
import itertools
def diff_letters(a,b):
return sum ( a[i] != b[i] for i in range(len(a)) )
w = ['AAHS','AALS','DAHS','XYZA']
for x,y in itertools.combinations(w,2):
if diff_letters(x,y) == 1:
print(x,y)
This prints:
AAHS AALS
AAHS DAHS
My question: How can I count and record that strings 'DAHS' and 'AALS' have exactly one partner, and 'AAHS' has two partners? I'll be filtering for directional combinations where each target_string
has exactly one near_matching_word
, so my final data would (as a JSON) look like this:
[
{
"target_word": "DAHS",
"near_matching_word": "AAHS"
},
{
"target_word": "AALS",
"near_matching_word": "AAHS"
}
]
(noticing that AAHS doesn't appear as a target_word
)
I have one version using functools.reduce
import itertools
import functools
import operator
def diff_letters(a,b):
return sum ( a[i] != b[i] for i in range(len(a)) )
w = ['AAHS','AALS','DAHS','XYZA']
pairs = []
for x,y in itertools.combinations(w,2):
if diff_letters(x,y) == 1:
#print(x,y)
pairs.append((x,y))
full_list = functools.reduce(operator.add, pairs)
for x in full_list:
if full_list.count(x) == 1:
print (x)
which prints
AALS
DAHS
but then I would have to go back to my big list pairs
to find the near_matching_word
. Of course, in my final version, list pairs
will be much larger, and the target_word
could be either the 1st or 2nd item in the tuple (x,y).
You could use a dictionary instead of a list of pairs:
pairs = {}
for x, y in itertools.combinations(w, 2):
if diff_letters(x, y) == 1:
pairs.setdefault(x, []).append(y)
pairs.setdefault(y, []).append(x)
result = [{ "target_word": key, "near_matching_word": head, } for key, (head, *tail) in pairs.items() if not tail]
print(result)
Output
[{'target_word': 'AALS', 'near_matching_word': 'AAHS'}, {'target_word': 'DAHS', 'near_matching_word': 'AAHS'}]
In the pairs
dictionary the keys are the target_words
and the values are the near_matching_words
. Then use a list comprehension to filter out those that have more that 1 near_matching_word
.