Search code examples
pythonstringloopslongest-substring

Longest repeated substring


learning python as I go for school work. Essentially, I need to find the longest repeated substring in a list of string as shown here. I have been reading through this article and have an understanding of what I should do.

so far my implementation is as follows:

def long_rptr_subString(testList):
    longSubstring = ''

    if len(testList) > 1 and len(testList[0]) > 0:
        for i in range(len(testList[0])):
            for j in range(len(testList[0])-i+1):
                if j > len(longSubstring) and all(testList[0][i:i+j] in x for x in testList):
                    longSubstring = testList[0][i:i+j]
    return longSubstring

now when I call my function with let's say ['slide', 'glidb', 'flidt', 'cridz', 'bidr'] I get the correct result of 'id' as being my longest substring.

However, when I pass the list ['slide', 'glidb', 'flidt', 'cridz', 'bidr', 'balh', 'tejka', 'djakljskdl', 'blah', 'blah', 'blah'] I don't get any return results. I should be getting back 'blah' as my longest substring, but I haven't figured out a way to accomplish this. It seems I am only able to match substrings when they are in ALL items of the list. How might I modify my code/logic to get the longest substring that occurs more than once?

Thank you.


Solution

  • You can only match substrings in all items because that's exactly what you ask for:

    all(testList[0][i:i+j] in x for x in testList)
    

    Even if you change that, you can only find the longest substring that is in the first substring, because you only check through testlist[0] .

    Instead, try something like:

    def longest_substr(lst):
        longest = None
        for word in lst:
            for i in range(len(word)):
                for j in range(i+1, len(word)+1):
                    if ((longest is None or (j - i > len(longest))) and
                        sum(word[i:j] in w for w in lst) > 1):
                        longest = word[i:j]
        return longest
    

    This finds the longest substring that's in at least two (> 1) of the words (or will return None for an empty list or list of empty strings) and gives me the following results:

    >>> longest_substr(['slide', 'glidb', 'flidt', 'cridz', 'bidr'])
    'lid'
    >>> longest_substr(['slide', 'glidb', 'flidt', 'cridz', 'bidr', 'balh', 'tejka', 'djakljskdl', 'blah', 'blah', 'blah'])
    'blah'
    

    Note that, once you remove the requirement that the substring must be in all strings, the longest in your first list of words is actually 'lid'.