I have a DNA sequence:
seq='AACGTTCAA'
I want to count how many letters are equal to the next one. In this example I should get 3 (because of AA-TT-AA).
In my first try I found out that this doesn't work, because i is a string and 1 an integer.
seq='AACGTTCAA'
count=[]
for i in seq:
if i == i+1: #neither i+=1
count.append(True)
else: count.append(False)
print(sum(count))
So I tried this:
seq='AACGTTCAA'
count=[]
for i in seq:
if i == seq[seq.index(i)+1]:
count.append(True)
else: count.append(False)
print(sum(count))
Then I receive this output which I cannot understand. 3 of these True should be False (1,5,8) Especially 8 as it is the last element of the string.
6
[True, True, False, False, True, True, False, True, True]
If thought about doing this with arrays but I think there might be a easy way to do this just in strings. Thanks
To answer your question, the statement for i in seq yields a series of string variables like 'A', 'A', 'C' etc.
so when in your first case when you are attempt to compare i == i+1:
you are adding 1 to a string variable which throw a TypeError.
In your second example, where you execute if i == seq[seq.index(i)+1]
gives a false result, since the seq.index(i) always returns the first occurrence of the value.
To do what you want on a basic level you can do the following:
def countPairedLetters(seq):
count = 0
for i in range(1, len(seq)):
# i starts with 1 and ends with len(seq)-1
if seq[i-1] == seq[i]:
count += 1
return count
Note: by starting with the index 1 and going to last, you avoid the issue with overrunning the sequence.