Search code examples
pythonmatplotlibcdf

draw CDF by reading values from file : matplotlib


I need to draw cdf of integer values read from a file. I am following the example here. I am not sure how I can normalize the data for pdf and then compute cdf.

import numpy as np
from pylab import *

with open ("D:/input_file.txt", "r+") as f:
    data = f.readlines()
    X = [int(line.strip()) for line in data]
    Y  = exp([-x**2 for x in X])  # is this correct? 

    # Normalize the data to a proper PDF
    Y /= ... # not sure what to write here

    # Compute the CDF
    CY = ... # not sure what to write here

    # Plot both
    plot(X,Y)
    plot(X,CY,'r--')

    show()

Solution

  • I can propose an answer, where you determine probability density function (PDF) and cumulative distribution function (CDF) using NumPy.

    import numpy as np
    # -----------------
    data = [88,93,184,91,107,170,88,107,167,90];
    # -----------------
    # get PDF:
    ydata,xdata = np.histogram(data,bins=np.size(data),normed=True);
    # ----------------
    # get CDF:
    cdf = np.cumsum(ydata*np.diff(xdata));
    # -----------------
    print 'Sum:',np.sum(ydata*np.diff(xdata))
    

    I am using Numpy method histogram, which will give me the PDF and then I will calculate CDF from PDF.