Search code examples
pythoncountsubstring

Python: how to count the occurrence of specific pattern with overlap in a list or string?


My problem is logically easy, but hard to implement for me. I have a list of numbers,(or you could say a string of numbers, it is not hard to transfer between strings and lists) I'd like to count the occurrence of some specific patterns with overlap. For example, the code is below:

A = [0, 1, 2 ,4, 5, 8, 4, 4, 5, 8, 2, 4, 4, 5, 5, 8, 9, 10, 3, 2]

For "4,5,8" occurs, then I count a1 = 1, a2 = 1, a3 = 1. For "4,4,5,8" occurs, then I count a1 = 2, a2 = 1, a3 = 1. For "4,4,5,5,5,8,8,8" occurs, I count a1 = 2, a2 = 3, a3 = 3. That is to say, for a pattern, you count if the pattern at least include "4,5,8" in this order. "4,5,9" doesn't count. "4,4,4,5,5,2,8" doesn't count at all. For "4,5,4,5,8", a1 = 1, a2 = 1, a3 = 1.

Thank you all for your help.


Solution

  • You can match patterns like this using regular expressions.

    https://regexr.com/ is a super helpful tool for experimenting with/learning about regular expressions.

    The built in module re does the job:

    import re
    
    def make_regex_object(list_of_chars):
        # make a re object out of [a, b... n]: 'a+b+ ... n+''    (note the '+' at end)
        # the '+' means it matches one or more occurrence of each character
        return re.compile('+'.join([str(char) for char in list_of_chars]) + '+')
    
    searcher = make_regex_object(['a', 'b', 'c', 'd'])
    searcher.pattern  # 'a+b+c+d+'
    
    x = searcher.search('abczzzzabbbcddefaaabbbccceeabc')   
    # caution - search only matches first instance of pattern
    
    print(x)   # <_sre.SRE_Match object; span=(7, 14), match='abbbcdd'>
    x.end()    # 14
    x.group()  # 'abbbcdd'
    

    You could then repeat this on the remainder of your string if you want to count multiple instances of the pattern. You can count character occurrences with x.group().count(char) or something better.