Search code examples
pythonbiopythonnaivebayes

How to create a naive exact matching code that allows at least 2 mismatch?


For example, ACTTTA occurs twice in ACTTACTTGATAAAGT, once at offset 0 with 2 mismatches, and once at offset 4 with 1 mismatch. So naive_2mm('ACTTTA', 'ACTTACTTGATAAAGT') should return the list |[0, 4].

I'm still a newbie. I've been working on this problem for almost a week now and I can't figure this out on my own. This is the code that I develop. Can someone explain why this code does not work and how can I do this?

def naive_2mm(p,t):
occurences = []
counter = 0
for i in range(len(t)-len(p)+1):
    while counter != (len(p)-2):
      for j in range(len(p)):
        if t[i+j] == p[j]:
          counter += 1
          continue
else:
   occurences.append(i)
return occurences

Solution

  • Your code is pretty close. I think the counter initialization is just about the only issue. This works:

    def naive_2mm(p,t):
        occurrences = []
        for i in range(len(t)-len(p)+1):
            counter = 0
            for j,k in zip(p, t[i:i+len(p)]):
                counter += j==k
            if counter >= len(p)-2:
                occurrences.append(i)
        return occurrences
    
    needle = 'ACTTTA' 
    haystack = 'ACTTACTTGATAAAGT'
    
    print(naive_2mm(needle,haystack))
    

    Or, for those of you who like one-liners, you can replace the 3-line sequence creating counter with:

            counter = sum(j==k for j,k in zip(p, t[i:i+len(p)]))