I have two sets of protein sequence data. As you can see, these 2 sequences look the same but actually they have 1 different amino acid (letter) between them.
For example:
File 1:
TV*TV*TV*TISTI*VWGKIGIRIE*PWIVSISEVESVACNSKNSNNNSE*K**FSEHFDLNYEN*K
File 2:
TV*TV*TV*TISTI*VWGKIGIRIE*PWIVSISVVESVACNSKNSNNNSE*K**FSEHFDLNYEN*K
Desired output:
File 1:
E
File 2:
V
I'm aware that we can print different pattern from two sets of data using command of grep, comm, diff; the search is based on line. But in this situation, how do I print the letter different between these two patterns? Thanks.
I don't think you need re
module here. Just a loop can fix your code.
file1='TV*TV*TV*TISTI*VWGKIGIRIE*PWIVSISEVESVACNSKNSNNNSE*K**FSEHFDLNYEN*K'
file2='TV*TV*TV*TISTI*VWGKIGIRIE*PWIVSISVVESVACNSKNSNNNSE*K**FSEHFDLNYEN*K'
for i in range(len(file1)):
if(file1[i]!=file2[i]):
print(file1[i]),(file2[i])
Your output is:E V
Here, we compare the files letter by letter.