Search code examples
pythonnumpymatplotlibscipycdf

Read file and plot CDF in Python


I need to read long file with timestamp in seconds, and plot of CDF using numpy or scipy. I did try with numpy but seems the output is NOT what it is supposed to be. The code below: Any suggestions appreciated.

import numpy as np
import matplotlib.pyplot as plt

data = np.loadtxt('Filename.txt')
sorted_data = np.sort(data)
cumulative = np.cumsum(sorted_data)

plt.plot(cumulative)
plt.show()

Solution

  • You have two options:

    1: you can bin the data first. This can be done easily with the numpy.histogram function:

    import numpy as np
    import matplotlib.pyplot as plt
    
    data = np.loadtxt('Filename.txt')
    
    # Choose how many bins you want here
    num_bins = 20
    
    # Use the histogram function to bin the data
    counts, bin_edges = np.histogram(data, bins=num_bins, normed=True)
    
    # Now find the cdf
    cdf = np.cumsum(counts)
    
    # And finally plot the cdf
    plt.plot(bin_edges[1:], cdf)
    
    plt.show()
    
    

    2: rather than use numpy.cumsum, just plot the sorted_data array against the number of items smaller than each element in the array (see this answer for more details https://stackoverflow.com/a/11692365/588071):

    import numpy as np
    
    import matplotlib.pyplot as plt
    
    data = np.loadtxt('Filename.txt')
    
    sorted_data = np.sort(data)
    
    yvals=np.arange(len(sorted_data))/float(len(sorted_data)-1)
    
    plt.plot(sorted_data,yvals)
    
    plt.show()