Search code examples
pythonstatisticspercentageletter

calculate percentage of strings instead of statistics


I would like to calculate the percentage of each letter in my file instead of the number. How can I modify the following code?

stat_file = open(filename, 'w')
one_letter = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
for letter in one_letter:
    stat_file.writelines('%s : %d \n' % (letter, statistics[letter]))

Thanks in advance!


Solution

  • First, the total letter counts can have two meanings:

    1. Only the words in one_letter (only 'A-Z' in your question)

    one_letter = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
    stat = {'A': 5, 
              'B':3, 
              'C':9, 
              'U': 5, 
              'D': 9, 
              'a': 99}
    
    total_count = sum(stat.get(letter, 0) for letter in one_letter) # should be 31
    

    2. All the words in your file (including 'a-z', '0-9', ...)

    total_count = sum(stat.values()) # should be 130
    


    After that, you can calculate your percentage by

    for letter in one_letter:
        stat_file.writelines("%s: %f%%\n" %(letter, \
            stat.get(letter, 0)/float(total_count)*100))
    

    Be reminded that stat.get(letter, 0) just for the case if not all letters in stat.

    You can replaced with stat[letter] if you sure A-Z are in stat.