What I am trying to do is to analyse the frequency of the letters in a text. As an example, I will use here a small sentence, but all that is thought to analyse huge texts (so it's better to be efficient).
test = "quatre jutges dun jutjat mengen fetge dun penjat"
Then I created a function which counts the frequencies
def create_dictionary2(txt):
dictionary = {}
i=0
for x in set(txt):
dictionary[x] = txt.count(x)/len(txt)
return dictionary
And then
import numpy as np
import matplotlib.pyplot as plt
test_dict = create_dictionary2(test)
plt.bar(test_dict.keys(), test_dict.values(), width=0.5, color='g')
ISSUES:
I want to see all the letters, but some of them are not seen (Container object of 15 artists) How to expand the histogram?
Then, I would like to sort the histogram, to obtain something like from this
For counting we can use a Counter
object. Counter also supports getting key-value pairs on the most common values:
from collections import Counter
import numpy as np
import matplotlib.pyplot as plt
c = Counter("quatre jutges dun jutjat mengen fetge dun penjat")
plt.bar(*zip(*c.most_common()), width=.5, color='g')
plt.show()
The most_common
method returns a list of key-value tuples. The *zip(*..)
is used to unpack (see this answer).
Note: I haven't updated the width or color to match your result plots.