Python Dict and Forloop with FASTA file

I was given a FASTA formatted file (like from this site: http://www.uniprot.org/proteomes/) that gives various protein coding sequences within a certain bacteria. I have been asked to give a full count and the relative percentage of each of the single code amino acids contained within the file, and return the results like:

L: 139002 (10.7%) 

A: 123885 (9.6%) 

G: 95475 (7.4%) 

V: 91683 (7.1%) 

I: 77836 (6.0%)

What I have so far:

 #!/usr/bin/python
ecoli = open("/home/file_pathway").read()
counts = dict()
for line in ecoli:
    words = line.split()
    for word in words:
        if word in ["A", "C", "D", "E", "F", "G", "H", "I", "K", "L", "M", "N", "P", "Q", "R", "S", "T", "V", "W", "Y"]:
            if word not in counts:
                counts[word] = 1
            else:
                counts[word] += 1

for key in counts:
    print key, counts[key]

I believe that doing this is retrieving all of the instances of the capital letters and not just those contained within the protein amino acid string, how can I limit it just to the coding sequence? I am also having trouble writing how to calculate the each single code over the total

Solution

The only lines that don't contain what you want start with > just ignore those:

with open("input.fasta") as ecoli: # will close your file automatically
    from collections import defaultdict
    counts = defaultdict(int) 
    for line in ecoli: # iterate over file object, no need to read all contents into memory
        if line.startswith(">"): # skip lines that start with >
            continue
        for char in line: # just iterate over the characters in the line
            if char in {"A", "C", "D", "E", "F", "G", "H", "I", "K", "L", "M", "N", "P", "Q", "R", "S", "T", "V", "W", "Y"}:
                    counts[char] += 1
    total = float(sum(counts.values()))       
    for key,val in counts.items():
        print("{}: {}, ({:.1%})".format(key,val, val / total))

You could also use a collections.Counter dict as the lines only contain what you are interested in:

with open("input.fasta") as ecoli: # will close your file automatically
    from collections import Counter
    counts = Counter()
    for line in ecoli: # iterate over file object, no need to read all contents onto memory
        if line.startswith(">"): # skip lines that start with >
            continue
        counts.update(line.rstrip())
    total = float(sum(counts.values()))
    for key,val in counts.items():
        print("{}: {}, ({:.1%})".format(key,val, val / total))