Search code examples
pythonconfidence-intervalprobability-densityprobability-distribution

Finding a 95% confidence interval given a table of observations


I have the following table:

perc
0   59.98797
1   61.89383
2   61.08403
3   61.00661
4   62.64753
5   62.18118
6   60.74520
7   57.83964
8   62.09705
9   57.07985
10  58.62777
11  60.02589
12  58.74948
13  59.14136
14  58.37719
15  58.27401
16  59.67806
17  58.62855
18  58.45272
19  57.62186
20  58.64749
21  58.88152
22  54.80138
23  59.57697
24  60.26713
25  60.96022
26  55.59813
27  60.32104
28  57.95403
29  58.90658
30  53.72838
31  57.03986
32  58.14056
33  53.62257
34  57.08174
35  57.26881
36  48.80800
37  56.90632
38  59.08444
39  57.36432

consisting of various percentages.

I'm interested in creating a probability distribution based on these percentages for the sake of coming up with a confidence interval (say 95%) of what we would expect this percentage to be.

With percDone['perc'].plot.density(), I can get a nice density plot, but I don't know how to extract CIs from this. I also considered making a Weighted KDE, but again, I don't know how to get CIs. How should I go about finding my confidence interval?

Thanks!


Solution

  • Try this using scipy package & numpy;

    import scipy.stats as st
    import numpy as np
      
    # Get data in a list
    lst = list(percDone['perc'])
      
    # create 95% confidence interval
    st.t.interval(alpha=0.95, df=len(lst)-1,
                  loc=np.mean(lst),
                  scale=st.sem(lst))
    

    For Kernel Density; Go through the documentation https://scikit-learn.org/stable/modules/generated/sklearn.neighbors.KernelDensity.html#sklearn.neighbors.KernelDensity

    Hope this Helps...