Search code examples
pythonlistmatchlongest-substring

Retrieve the longest matching value from a list by matching with another list [Python 2.7]


There are two lists to be matched, li_a is the given list consists of sequence of characters of a sentence, whereas li_b is the collection of words.

li_a = ['T','h','e','s','e','45','a','r','e','c','a','r','s']


li_b = ['T','Th','The','Thes','These','a','ar','are','c','ca','car','cars']

The process is to match items of li_a iteratively, with li_b items. If the first character of li_a is similar to li_b items, the first character of li_a joins with next character, and redo the process until it reaches to its longest match. Then, the longest term should be split, and the process will continue till the end. As the unknown characters and words of li_a which do not appear in li_b will be appended as they were before.

The final work should be like this:

new_li = ['These','45','are','cars']

The attempt getting so far, but this works for two strings not for Lists, and it doesn't retrieve unidentified words.

def longest_matched_substring(s1, s2):
    m = [[0] * (1 + len(s2)) for i in xrange(1 + len(s1))]
    longest, x_longest = 0, 0
    for x in xrange(1, 1 + len(s1)):
       for y in xrange(1, 1 + len(s2)):
           if s1[x - 1] == s2[y - 1]:
               m[x][y] = m[x - 1][y - 1] + 1
               if m[x][y] > longest:
                   longest = m[x][y]
                   x_longest = x
           else:
               m[x][y] = 0

    return s1[x_longest - longest: x_longest]

Solution

  • You can do it using two for loops and a temp variable as the following:

    def longest_matched_substring(li1, li2):
        new_li = []
        tmp = ''
        for a in li1:
            tmp += a
            count = 0
            for b in li2:
                if tmp == b:
                    count += 1
            if count == 0:
                tmp1 = tmp.replace(a, '')
                new_li.append(tmp1)
                tmp = a
        if li2.__contains__(tmp):
            new_li.append(tmp) 
        return new_li
    

    INPUT:

    li_a = ['T','h','e','s','e','45','a','r','e','c','a','r','s']
    li_b = ['T','Th','The','Thes','These','a','ar','are','c','ca','car','cars']
    print longest_matched_substring(li_a, li_b)
    

    OUTPUT:

    ['These', '45', 'are', 'cars']
    

    as for the new scenario, you can modify the function as the following:

    def longest_matched_substring(li1, li2):
        new_li = []
        tmp = ''
        for a in li1:
            tmp += a
            count = 0
            for b in li2:
                if tmp == b:
                    count += 1
            if count == 0:
                tmp1 = tmp.replace(a, '')
    
                new_li.append(tmp1)
                tmp = a
        if li_b.__contains__(tmp):
            new_li.append(tmp) 
        for e1 in new_li:
            tmp2 = e1
            rm = []
            for e2 in new_li:
                if e1 != e2:
                    tmp2 += e2
                    rm.append(e2)
                    if tmp2 in li2:
                        new_li.insert(new_li.index(e1), tmp2) # if order matters
                        #new_li.append(tmp2) if order doesn't matter
                        for r in rm:
                            new_li.remove(r)
                        new_li.remove(e1)
                        rm = []
                        break
        return new_li
    

    INPUT:

    li_a = ['T','h','e','s','e','45','a','r','e','c','a','r','s']
    li_b = ['T','Th','The','These','a','ar','are','c','ca','car','cars']
    print longest_matched_substring(li_a, li_b)
    

    OUTPUT:

    ['These', '45', 'are', 'cars']