Search code examples
pythonstringparsingpattern-matchingcut

python pattern cutting of strings in a list


I have a dictionary variable "d" with key ,an integer, and value as a list of strings.

368501900 ['GH131.hmm  ', 'CBM1.hmm  ']
368499531 ['AA8.hmm  ']
368500556 ['AA7.hmm  ']
368500559 ['GT2.hmm  ']
368507728 ['GH16.hmm  ']
368496466 ['AA2.hmm  ']
368504803 ['GT21.hmm  ']
368503093 ['GT1.hmm  ', 'GT4.hmm  ']

The code is like this:

d = dict()

for key in d:
    dictValue = d[key]

    dictMerged = list(sorted(set(dictValue), key=dictValue.index))
    print (key, dictMerged)

However, I want to remove string after the numbers in the lists so I can have a result like this:

368501900 ['GH', 'CBM']
368499531 ['AA']
368500556 ['AA']
368500559 ['GT']
368507728 ['GH']
368496466 ['AA']
368504803 ['GT']
368503093 ['GT']

I think the code should be inserted between dictValue and dictMerged, but I cannot make a logic. Please, any ideas?


Solution

  • import this at the beginning

        import re
    

    now use this line between dictValue and dictMerged

        new_dict_value = [re.sub(r'\d.*', '', x) for x in dictValue]
    

    and then use new_dict_value in the next line