Search code examples
pythonnumpydictionarystring-matching

Creating a dictionary in Naive algorithm for Pattern Searching


I'm trying to find the number of times a certain string pattern is found inside a sequence of characters; for the bioinformatics lovers, it's actually finding the number of times a certain motif occurs along the genome. For this purpose I've found the following Python based function:

def search(pat, txt):
    M = len(pat)
    N = len(txt)
    
    for i in range(N - M + 1):
        j = 0 

        while(j < M):
            if (txt[i + j] != pat[j]):
                break
            j += 1
 
        if (j == M):
            print(f"{pat} found at index ", i)

Which gives me back this kind of result:

GAATC found at index  1734
GAATC found at index  2229
GAATC found at index  2363
GAATC found at index  2388
GAATC found at index  2399
GAATC found at index  2684
GAATC found at index  5634
GAATC found at index  7021
GAGTC found at index  1671
GAGTC found at index  4043

And so on for each pattern (motif). As you can see the pattern (motif) "GAATC" is present 8 times and it's repeated for each position it has been found. I'd like to have on terminal something like this:

GAATC found 8 times
GAGTC found 2 times

And so on. In the title I wrote "Creating a dictionary" supposing it's the best choice but I'm open to all the possible suggestions.

Can you help me? Thank you!


Solution

  • def search(pat, txt):
        M = len(pat)
        N = len(txt)
        
        found = 0
    
        for i in range(N - M + 1):
            if (txt[i:i+M] == pat):
                found += 1
    
        print(f"{pat} found {found} times.")
    

    Or use regular expressions...

    import re
    
    def search(pat, txt):
        found = len(re.findall(f'(?={pat})', text))
    
        print(f"{pat} found {found} times.")