I'm trying to find the number of times a certain string pattern is found inside a sequence of characters; for the bioinformatics lovers, it's actually finding the number of times a certain motif occurs along the genome. For this purpose I've found the following Python based function:
def search(pat, txt):
M = len(pat)
N = len(txt)
for i in range(N - M + 1):
j = 0
while(j < M):
if (txt[i + j] != pat[j]):
break
j += 1
if (j == M):
print(f"{pat} found at index ", i)
Which gives me back this kind of result:
GAATC found at index 1734
GAATC found at index 2229
GAATC found at index 2363
GAATC found at index 2388
GAATC found at index 2399
GAATC found at index 2684
GAATC found at index 5634
GAATC found at index 7021
GAGTC found at index 1671
GAGTC found at index 4043
And so on for each pattern (motif). As you can see the pattern (motif) "GAATC" is present 8 times and it's repeated for each position it has been found. I'd like to have on terminal something like this:
GAATC found 8 times
GAGTC found 2 times
And so on. In the title I wrote "Creating a dictionary" supposing it's the best choice but I'm open to all the possible suggestions.
Can you help me? Thank you!
def search(pat, txt):
M = len(pat)
N = len(txt)
found = 0
for i in range(N - M + 1):
if (txt[i:i+M] == pat):
found += 1
print(f"{pat} found {found} times.")
Or use regular expressions...
import re
def search(pat, txt):
found = len(re.findall(f'(?={pat})', text))
print(f"{pat} found {found} times.")