Search code examples
pythonpython-3.xalgorithmsubstringpython-3.8

Common substring in list of strings


i encountered a problem while trying to solve a problem where given some strings and their lengths, you need to find their common substring. My code for the part where it loops through the list and then each through each word in it is this:

num_of_cases = int(input())
for i in range(1, num_of_cases+1):
    if __name__ == '__main__':
        len_of_str = list(map(int, input().split()))
        len_of_virus = int(input())

    strings = []
    def string(strings, len_of_str):
        len_of_list = len(len_of_str)
        for i in range(1, len_of_list+1):
            strings.append(input())
    
    lst_of_subs = []
    virus_index = []
    def substr(strings, len_of_virus):
        for word in strings:
             for i in range(len(len_of_str)):
                  leng = word[i:len_of_virus]
                  lst_of_subs.append(leng)
                  virus_index.append(i)

    print(string(strings, len_of_str))
    print(substr(strings, len_of_virus))

And it prints the following given the strings: ananasso, associazione, tassonomia, massone

['anan', 'nan', 'an', 'n', 'asso', 'sso', 'so', 'o', 'tass', 'ass', 'ss', 's', 'mass', 'ass', 'ss', 's']

It seems that the end index doesn't increase, although i tried it by writing len_of_virus += 1 at the end of the loop.

sample input:

1
8 12 10 7
4
ananasso
associazione
tassonomia
massone

where the 1st letter is the number of cases, the second line is the name of the strings, 3rd is the length of the virus(the common substring), and then there are the given strings that i should loop through.

expected output:

Case #1: 4 0 1 1

where the four numbers are the starting indexes of the common substring.(i dont think that code for printing cares us for this particular problem)

What should i do? Please help!!


Solution

  • The problem, beside defining functions in odd places and using said function to get side effect in ways that aren't really encourage, is here:

    for i in range(len(len_of_str)):
        leng = word[i:len_of_virus]
    

    i constantly increase in each iteration, but len_of_virus stay the same, so you are effectively doing

    word[0:4] #when len_of_virus=4
    word[1:4]
    word[2:4]
    word[3:4]
    ...
    

    that is where the 'anan', 'nan', 'an', 'n', come from the first word "ananasso", and the same for the other

    >>> word="ananasso"
    >>> len_of_virus = 4
    >>> for i in range(len(word)):
            word[i:len_of_virus]
    
        
    'anan'
    'nan'
    'an'
    'n'
    ''
    ''
    ''
    ''
    >>> 
    

    you can fix it moving the upper end by i, but that leave with the same problem in the other end

    >>> for i in range(len(word)):
        word[i:len_of_virus+i]
    
        
    'anan'
    'nana'
    'anas'
    'nass'
    'asso'
    'sso'
    'so'
    'o'
    >>>
    

    so some simple adjustments in the range and problem solve:

    >>> for i in range(len(word)-len_of_virus+1):
            word[i:len_of_virus+i]
    
        
    'anan'
    'nana'
    'anas'
    'nass'
    'asso'
    >>> 
    

    Now that the substring part is done, the rest is also easy

    >>> def substring(text,size):
            return [text[i:i+size] for i in range(len(text)-size+1)]
    
    >>> def find_common(lst_text,size):
            subs = [set(substring(x,size)) for x in lst_text]
            return set.intersection(*subs)
    
    >>> test="""ananasso
    associazione
    tassonomia
    massone""".split()
    >>> find_common(test,4)
    {'asso'}
    >>> 
    

    To find the common part to all the strings in our list we can use a set, first we put all the substring of a given word into a set and finally we intersect them all.

    the rest is just printing it to your liking

    >>> virus = find_common(test,4).pop()
    >>> print("case 1:",*[x.index(virus) for x in test])
    case 1: 4 0 1 1
    >>>