How to remove duplicates and unify values in lists where values are very close to each other in Python?

I have in Python lists like below:

x1 = ['lock-service',
 'Capo Service',


x2 = ['journal-service',

As you can see in both lists, duplicate values appear in different forms, for example:

in the list 1:

  • 'xyz-reporting-service' and 'reporting-service'
  • 'harbor-service' and 'harbor-service-prod'
  • 'capo-service' and 'Capo Service'
  • 'artifactory-service' and ''
  • 'rocket-chat-service' and 'rocketchat-service'

in the list 2:

  • 'xyz-reporting-service' and 'reporting-service'
  • 'rocket-chat-service' and 'rocketchat-service'
  • 'ansible-service' and 'ansible-dpservice'

I need a universal solution that does not only on these sample lists:

  • will remove the duplicated sample values presented above
  • unifies the values in the list to the name-service form

How can I do that in Python 3.11 ?


  • My solution adds cleanup steps before and after fuzzy matching. Shoutout to @Scott Boston, I learned about variable naming within list comprehension from his answer.

    !pip install RapidFuzz
    import re
    from rapidfuzz import fuzz, utils
    def dedup(lst):
        lst = list(set([re.sub(r'-service.*$', r'-service', x) for x in lst])) #clean up values with extra characters after "-service"
        vals = {val1:{val2:ratio for val2 in lst
                      if val1!=val2 #avoid matching to self
                      and (ratio:=fuzz.WRatio(val1, val2, processor=utils.default_process))>=90} #fuzzy match
                for val1 in lst 
                if len(subs:=val1.split('-'))==2 #name-service format requested by OP
                and subs[-1]=='service'} #check if ends in "-service"
        not_captured = [x for x in lst if x not in list(vals.keys())+sum([list(x.keys()) for x in vals.values()], [])] #vals from original list not in match dict keys or values
        new_x = list(vals.keys())+[''.join(x.replace('-service', '').split('-'))+'-service' for x in not_captured] #deduplicated list forcing name-service format for longer values with extra "-"
        return new_x #returns only deduplicated list