Search code examples
pythonpython-3.xstringsequencematcher

Find common fragments in multiple strings using SequenceMatcher


I would like to find common string between: strings_list = ['PS1 123456 Test', 'PS1 758922 Test', 'PS1 978242 Test']

The following code returns only the first part "PS1 1", I would imagine the result is "PS1 Test". Could you help me, is it possible to obtain using SequenceMatcher? Thank you in advance!

def findCommonStr(strings_list: list) -> str:

        common_str = strings_list[0]

        for i in range(1, n):
            match = SequenceMatcher(None, common_str, strings_list[i]).get_matching_blocks()[0]      
            common_str = common_str[match.b: match.b + match.size]

        common_str = common_str.strip()

        return common_str

Solution

  • This is without SequenceMatcher approach. If all strings follow the same pattern, you can split them into words on whitespaces.

    strings_list = ['PS1 123456 Test', 'PS1 758922 Test', 'PS1 978242 Test']
    
    test = []
    for string in strings_list:
      print(string.split())
      test.append(string.split())
    
    >>> ['PS1', '123456', 'Test']
    ['PS1', '758922', 'Test']
    ['PS1', '978242', 'Test']
    

    Now you can simply do a set intersection to find the common elements. Reference: Python -Intersection of multiple lists?

    set(test[0]).intersection(*test[1:])
    
    >>> {'PS1', 'Test'}
    
    # join them to get string
    ' '.join(set(test[0]).intersection(*test[1:]))
    
    >>> PS1 Test
    

    This would only work if they follow this pattern of separated by white space.

    Function:

    def findCommonStr(strings_list: list) -> str:
    
      all_str = []
      for string in strings_list:
        
        all_str.append(string.split())
    
      return ' '.join(set(all_str[0]).intersection(*all_str[1:]))