Search code examples
pythonpattern-matchingknuth-morris-pratt

Getting indices of patterns found in a sequence using KNUTH-MORRIS-PRATT?


I am trying to find patterns in a sequence of integers. I have found the KNUTH-MORRIS-PRATT (KMP) in this link.

I've fed the function a 'pattern' to find in a 'text.' But the output of the KMP function is an object. I need the indices for the instances of the pattern in the text. I tried checking out the attributes of the object by typing dot and pressing tab but nothing is there. How can I get the indices?

Edit

Code:

> # Knuth-Morris-Pratt string matching
> # David Eppstein, UC Irvine, 1 Mar 2002
> 
> from __future__ import generators
> 
> def KnuthMorrisPratt(text, pattern):
> 
>     '''Yields all starting positions of copies of the pattern in the text. Calling conventions are similar to string.find, but its
> arguments can be lists or iterators, not just strings, it returns all
> matches, not just the first one, and it does not need the whole text
> in memory at once. Whenever it yields, it will have read the text
> exactly up to and including the match that caused the yield.'''
> 
>     # allow indexing into pattern and protect against change during yield
>     pattern = list(pattern)
> 
>     # build table of shift amounts
>     shifts = [1] * (len(pattern) + 1)
>     shift = 1
>     for pos in range(len(pattern)):
>         while shift <= pos and pattern[pos] != pattern[pos-shift]:
>             shift += shifts[pos-shift]
>         shifts[pos+1] = shift
> 
>     # do the actual search
>     startPos = 0
>     matchLen = 0
>     for c in text:
>         while matchLen == len(pattern) or \
>               matchLen >= 0 and pattern[matchLen] != c:
>             startPos += shifts[matchLen]
>             matchLen -= shifts[matchLen]
>         matchLen += 1
>         if matchLen == len(pattern):
>             yield startPos

Sample Text: [1, 2, 2, 3, 3, 2, 4, 5, 2, 2, 3, 2]
Sample Pattern: [2, 2, 3]

Sample output: [1, 8] 

Solution

  • You aren't returning anything from the function and you need to loop through the iterator to get the indices by using comprehension. Rewrite it this way:

    from __future__ import generators
    
    def KnuthMorrisPratt(text, pattern):
    
        pattern = list(pattern)
    
        # build table of shift amounts
        shifts = [1] * (len(pattern) + 1)
        shift = 1
        for pos in range(len(pattern)):
            while shift <= pos and pattern[pos] != pattern[pos-shift]:
                shift += shifts[pos-shift]
            shifts[pos+1] = shift
    
        # do the actual search
        startPos = 0
        matchLen = 0
        for c in text:        
            while matchLen == len(pattern) or matchLen >= 0 and pattern[matchLen] != c:
                startPos += shifts[matchLen]
                matchLen -= shifts[matchLen]
            matchLen += 1
            if matchLen == len(pattern):
                yield startPos
    
        return matchLen
    
    t= [1, 2, 2, 3, 3, 2, 4, 5, 2, 2, 3, 2]
    p= [2, 2, 3]
    [k for k in KnuthMorrisPratt(t,p)] 
    
    [1, 8]