Search code examples
pythonalgorithmconceptual

Conceptual: Collect "synonyms" from a list of "words"


This question is inspired by: Generating a list of repetitions regardless of the order and its accepted answer: https://stackoverflow.com/a/20336020/1463143

Here, "alphabet" is any set of letters e.g. '012' or 'EDCRFV'

"words" are obtained by doing a cartesian product over the alphabet. We should be able to specify n for getting n-lettered words. Example:

from itertools import product
alphabet = '012'
wordLen = 3
wordList = [''.join(letter) for letter in product(alphabet,repeat=wordLen)]
print wordList

which gives:

['000', '001', '002', '010', '011', '012', '020', '021', '022', '100', '101', '102', '110', '111', '112', '120', '121', '122', '200', '201', '202', '210', '211', '212', '220', '221', '222']

a "synonym" is obtained by... uh... if only I could articulate this...

these lists contain all the possible "synonyms" within wordList:

['000',
 '111',
 '222'] 

['001',
 '002',
 '110',
 '112',
 '220',
 '221']

['010',
 '020',
 '101',
 '121',
 '202',
 '212']

['011',
 '022',
 '100',
 '122',
 '200',
 '211']

['012',
 '021',
 '102',
 '120',
 '201',
 '210']

Sadly, I'm unable to articulate how I obtained the above lists of "synonyms". I would like to do something as above for an arbitrary alphabet forming n-lettered words.


Solution

  • Looks quite easy:

    syns = collections.defaultdict(list)
    
    for w in wordList:
        hash = tuple(w.index(c) for c in w)
        syns[hash].append(w)
    
    print syns.values()