Search code examples
pythondictionaryhistogramcumulative-sumcumulative-frequency

cumulative distribution in dictionary


Im trying to calculate a cumulative distribution into a dictionary. The distribution should take letters from a given text and find the probability over the times they appear in the text, and from this it should calculate the cumulative distribution. I don't know if I'm doing it the right way, but here's my code:

with open('text') as infile:
text = infile.read()

letters = list(text)
letter_freqs = Counter(letters(text))
letter_sum = len(letters) 
letter_proba = [letter_freqs[letter]/letter_sum for letter in letters(text)]

And now I wan't to calculate a cumulative distribution, and plot it like a histogram, can someone help me?


Solution

  • The following should at least run (which your code as posted won't):

    import collections, itertools
    
    with open('text') as infile:
        letters = list(infile.read())  # not just letters: whitespace & punct, too
        letter_freqs = collections.Counter(letters)
        letter_sum = len(letters)
        letters_set = sorted(set(letters))
        d = {l: letter_freqs[letter]/letter_sum for l in letters_set}
        cum = itertools.accumulate(d[l] for l in letters_set)
        cum_d = dict(zip(letters_set, cum)
    

    Now you have in cum_d a dictionary mapping each character, not just letters of course since you're done nothing to exclude whitespace and punctuation, to the cumulative probability of that character and all those below it in alphabetical order. How you plan to "plot" a dictionary, no idea. But hey, at least this does run, and produce something that might fit at least one interpretation of the vague specs you give for the task!-)