I have this table of values and I was wondering how can I let the program read each line. For each line with 'a', 'g', 'c', or 'u', I want it to increase the count by one. For this example, when I run it, it should have a result of 12.
a 1 0.000 S
g 2 0.260 S
a 3 0.990 S
a 4 0.980 S
c 5 0.000 S
u 6 1.000 S
c 7 0.000 S
a 8 1.000 S
a 9 1.000 T
u 10 0.820 S
a 11 1.000 T
g 12 0.000 S
F 13 1.000 S
S 14 1.000 S
T 15 1.000 S
The code that I tried is below:
rna_residues = ['a','c','g','u']
count_dict = {}
#Making the starting number 0
rna_count = 0
#if any lines of the file starts with one of the rna_residue
if line.startswith(tuple(rna_residues)):
for residue in line:
if residue in rna_residues:
rna_count += 1
count_dict[line] = [rna_count]
print count_dict
Somehow, when I run it, there is no list of the count:
{'a 1 0.000 S\n': [1]}
{'g 2 0.260 S\n': [1]}
{'a 3 0.990 S\n': [1]}
{'a 4 0.980 S\n': [1]}
{'c 5 0.000 S\n': [1]}
{'u 6 1.000 S\n': [1]}
{'c 7 0.000 S\n': [1]}
{'a 8 1.000 S\n': [1]}
{'a 9 1.000 T\n': [1]}
{'u 10 0.820 S\n': [1]}
{'a 11 1.000 T\n': [1]}
{'g 12 0.000 S\n': [1]}
I know this is a lot of information, but is there any tips that can help me with this? Thanks a lot!!
You are using the whole line as a key in the dictionary, so unless you have identical lines all values will be 1. Why do you need the dictionary at all? I was under the impression you want to count the number of lines that start with any one of the characters 'a','c','g','u'
For this, the following code is suffice:
rna_residues = ['a','c','g','u']
rna_count = 0
with open('/path/to/file') as opened_file:
for line in opened_file:
# or if line[0] in rna_residues
if any(line.startswith(residue) for residue in rna_residues):
rna_count += 1
print rna_count
# 12