python bioinformatics biopython fuzzy-search

find a Pattern Match in string in Python

I am trying to find a amino acid pattern (B-C or M-D, where '-' could be any alphabet other than 'P') in a protein sequence let say 'VATLDSCBACSKVNDNVKNKVKVKNVKMLDHHHV'. Protein sequence in in a fasta file.

I have tried a lot but couldn't find any solution.

I tried a lot. the following code is one of them

import Bio
from Bio import SeqIO

seqs= SeqIO.parse(X, 'fasta') ### to read the sequences from fasta file
for aa in seqs:
    x=aa.seq ## gives the sequences as a string (.seq is a build in function of Biopython)
    
    for val, i in enumerate(x):          
        
        if i=='B':    
            if (x[val+2])=='C':
                
                if x[val+1]!='P':
                   pattern=((x[val]:x[val+2])) ## trying to print full sequence B-C

But unfortunately none of them work. It would be great if someone can help me out with this problem.

Solution

Use a regular expression with an exception assertion "^".

import re

string = 'VATLDSCBACSKVNDNVKNKVKVKNVKMLDHHHV'
re.findall(r"B[^P]C|M[^P]D", string)

Output:

['BAC', 'MLD']