I'm using scipy.stats and I need the CDF up to a given value x for some distributions, I know PDFs can be greater than 1 because they are not probabilities but densities so they should integrate to 1 even if specific values are greater, but CDFs should never be greater than 1 and when running the cdf function on scipy.stats sometimes I get values like 2.89, i'm completely sure i'm using cdf and not pdf(that was my first guess), this is messing my results and algorithm because I need accumulated probabilities, why is scipy.stats cdf returning values greater than 1 and/or how should I proceed to fix it?
Code for reproducing the issue with a sample distribution and parameters(but it happens with others too):
from scipy import stats
distribution = stats.gausshyper
params = [9.482986347673158, 16.65813644507513, -38.11083665959626, 16.08698932118982, -13.387170754433273, 18.352117022674125]
test_val = [-0.512720,1,1]
arg = params[:-2]
loc = params[-2]
scale = params[-1]
print("cdf:",distribution.cdf(test_val,*arg, loc=loc,scale=scale))
print("pdf:",distribution.pdf(test_val,*arg, loc=loc,scale=scale))
cdf: [2.68047481 7.2027761 7.2027761 ] pdf: [2.76857133 2.23996739 2.23996739]
The problem lies in the parameters that you have specified for the Gaussian hypergeometric (HG) distribution, specifically in the third element of params
, which is the parameter beta in the HG distribution (see equation 2 in this paper for the definiton of the density of the Gauss Hypergeometric distr.). This parameter has to be positive for HG to have a valid density. Otherwise, the density won't integrate to 1, which is exactly what is happening in your example. With a negative beta, the distribution is not a valid probability distribution.
You can also find the requirement that beta (denoted as b) has to be positive in the scipy documentation here. Changing beta to a positive parameter immediately solves your problem:
from scipy import stats
distribution = stats.gausshyper
params = [9.482986347673158, 16.65813644507513, 38.11083665959626, 16.08698932118982, -13.387170754433273, 18.352117022674125]
test_val = [-0.512720,1,1]
arg = params[:-2]
loc = params[-2]
scale = params[-1]
print("cdf:",distribution.cdf(test_val,*arg, loc=loc,scale=scale))
print("pdf:",distribution.pdf(test_val,*arg, loc=loc,scale=scale))
Output:
cdf: [1. 1. 1.]
pdf: [3.83898392e-32 1.25685346e-35 1.25685346e-35]
,where all cdfs integrate to 1 as desired. Also note that your x also has to be between 0 and 1, as described in the scipy documentation here.