Search code examples
pythondictionarynlpsimilarity

How to delete duplicate key value(string) pairs from a dictionary?


I am trying to delete the key-value pair as a whole from a dictionary if found to be duplicates based on string similarity. Example:

d1={1:'Colins business partner sends millions of dollars to groups which target lives 
   for gruesome deaths domestically and abroad',
2:'Colins business partner sends millions of dollars to groups which target lives',
3:'Don t skip leg day y all'}

In the above code 1 and 2 are similar strings,so one them must be deleted and the following must be the output keeping intact the IDs:

 d1={1:'Colins business partner sends millions of dollars to groups which target lives 
   for gruesome deaths domestically and abroad',
3:'Don t skip leg day y all'}

Please help me solve this issue.


Solution

  • If by "similarity" you mean that one string is contained within another and you want to eliminate the shorter one, you can do it by nested loops as shown below. Note that you want to make a copy of your dictionary so that you don't change the original dictionary during iteration.

    d1={1:'Colins business partner sends millions of dollars to groups which target lives for gruesome deaths domestically and abroad',
    2:'Colins business partner sends millions of dollars to groups which target lives',
    3:'Don t skip leg day y all'}
    
    d2 = dict(d1) #make a copy of d1
    for k, sent in d1.items():
        for sentence in d1.values():
            if sent in sentence and len(sent) != len(sentence):
                del d2[k]
                break
    print(d2)
    # {1: 'Colins business partner sends millions of dollars to groups which target lives for gruesome deaths domestically and abroad', 3: 'Don t skip leg day y all'}