Search code examples
pythonnumpyscipystatistics

How to compute the median and 68% confidence interval around the median of non-Gaussian distribution in Python?


I have a data set which is a numpy array say a=[a1,a2,.....] and also the weights of the data w=[w1,w2,w3...]. I have computed the histogram using numpy histogram package which gives me the hist array. Now I want to compute the median of this probability distribution function and also the 68% contour around the median. Remember my dataset is not Gaussian.

Can anyone help? I am using python.


Solution

  • Here a solution using scipy.stats.rv_discrete:

    from __future__ import division, print_function
    import numpy as np, scipy.stats as st
    
    # example data set
    a = np.arange(20)
    w = a + 1
    
    # create custom discrete random variable from data set
    rv = st.rv_discrete(values=(a, w/w.sum()))
    
    # scipy.stats.rv_discrete has methods for median, confidence interval, etc.
    print("median:", rv.median())
    print("68% CI:", rv.interval(0.68))
    

    Output reflects the uneven weights in the example data set:

    median: 13.0
    68% CI: (7.0, 18.0)