Below is the data for which I want to plot the PDF. https://gist.github.com/ecenm/cbbdcea724e199dc60fe4a38b7791eb8#file-64_general-out
Below is the script
import numpy as np
import matplotlib.pyplot as plt
import pylab
data = np.loadtxt('64_general.out')
H,X1 = np.histogram( data, bins = 10, normed = True, density = True) # Is this the right way to get the PDF ?
plt.xlabel('Latency')
plt.ylabel('PDF')
plt.title('PDF of latency values')
plt.plot(X1[1:], H)
plt.show()
When I plot the above, I get the following.
It is a legit way of approximating the PDF. Since np.histogram uses various techniques for binning the values you won't get the exact frequency of each number in your input. For a more exact approximation you should count the occurrence of each number and divide it by the total count. Also, since these are discrete values, the plot could be plotted as points or bars to give a more correct impression.
In the discrete case, the sum of the frequencies should equal 1. In the continuous case you can for example use np.trapz()
to approximate the integral.