learning python as I go for school work. Essentially, I need to find the longest repeated substring in a list of string as shown here. I have been reading through this article and have an understanding of what I should do.
so far my implementation is as follows:
def long_rptr_subString(testList):
longSubstring = ''
if len(testList) > 1 and len(testList[0]) > 0:
for i in range(len(testList[0])):
for j in range(len(testList[0])-i+1):
if j > len(longSubstring) and all(testList[0][i:i+j] in x for x in testList):
longSubstring = testList[0][i:i+j]
return longSubstring
now when I call my function with let's say ['slide', 'glidb', 'flidt', 'cridz', 'bidr']
I get the correct result of 'id'
as being my longest substring.
However, when I pass the list ['slide', 'glidb', 'flidt', 'cridz', 'bidr', 'balh', 'tejka', 'djakljskdl', 'blah', 'blah', 'blah']
I don't get any return results. I should be getting back 'blah'
as my longest substring, but I haven't figured out a way to accomplish this. It seems I am only able to match substrings when they are in ALL items of the list. How might I modify my code/logic to get the longest substring that occurs more than once?
Thank you.
You can only match substrings in all
items because that's exactly what you ask for:
all(testList[0][i:i+j] in x for x in testList)
Even if you change that, you can only find the longest substring that is in the first substring, because you only check through testlist[0]
.
Instead, try something like:
def longest_substr(lst):
longest = None
for word in lst:
for i in range(len(word)):
for j in range(i+1, len(word)+1):
if ((longest is None or (j - i > len(longest))) and
sum(word[i:j] in w for w in lst) > 1):
longest = word[i:j]
return longest
This finds the longest substring that's in at least two (> 1
) of the words (or will return None
for an empty list or list of empty strings) and gives me the following results:
>>> longest_substr(['slide', 'glidb', 'flidt', 'cridz', 'bidr'])
'lid'
>>> longest_substr(['slide', 'glidb', 'flidt', 'cridz', 'bidr', 'balh', 'tejka', 'djakljskdl', 'blah', 'blah', 'blah'])
'blah'
Note that, once you remove the requirement that the substring must be in all strings, the longest in your first list of words is actually 'lid'
.