I am trying to delete the key-value pair as a whole from a dictionary if found to be duplicates based on string similarity. Example:
d1={1:'Colins business partner sends millions of dollars to groups which target lives
for gruesome deaths domestically and abroad',
2:'Colins business partner sends millions of dollars to groups which target lives',
3:'Don t skip leg day y all'}
In the above code 1 and 2 are similar strings,so one them must be deleted and the following must be the output keeping intact the IDs:
d1={1:'Colins business partner sends millions of dollars to groups which target lives
for gruesome deaths domestically and abroad',
3:'Don t skip leg day y all'}
Please help me solve this issue.
If by "similarity" you mean that one string is contained within another and you want to eliminate the shorter one, you can do it by nested loops as shown below. Note that you want to make a copy of your dictionary so that you don't change the original dictionary during iteration.
d1={1:'Colins business partner sends millions of dollars to groups which target lives for gruesome deaths domestically and abroad',
2:'Colins business partner sends millions of dollars to groups which target lives',
3:'Don t skip leg day y all'}
d2 = dict(d1) #make a copy of d1
for k, sent in d1.items():
for sentence in d1.values():
if sent in sentence and len(sent) != len(sentence):
del d2[k]
break
print(d2)
# {1: 'Colins business partner sends millions of dollars to groups which target lives for gruesome deaths domestically and abroad', 3: 'Don t skip leg day y all'}