python math scipy statistics probability-density

Gumbel half distribution

I have generated a probability density function in python using scipy using the code below:

import matplotlib.pyplot as plt
from scipy.stats import gumbel_l
import numpy as np

data = gumbel_l.rvs(size=100000)
data = np.sort(data)

plt.hist(data, bins=50, density=True)
plt.plot(data, gumbel_l.pdf(data))
plt.show()

My quesiton is if there is a way to get a one tailed distribution of this, namely the left side of the distribution, both to generate it and to fit a pdf to it.

Solution

You can create a custom rv_continuous subclass. The minimum requirement, is that the custom class provides a pdf. The pdf can be obtained from gumbel_l's pdf up till x = 0 and being zero for positive x. The pdf needs to be normalized to get its area equal to 1, for which we can divide by gumbel_l's cdf(0).

With only the pdf implemented, you'll notice that obtaining random variates (.rvs) will be rather slow. Scipy rv_continuous very slow explains this can be remedied by generating too many variates and throwing away the values that are too high, or by providing an implementation for the ppf.

As the ppf can be obtained straightforward from gumbel_l's ppf, the code below implements that solution. A similar approach can be used to truncate at another position, or even to truncate at two spots.

import matplotlib.pyplot as plt
from scipy.stats import gumbel_l, rv_continuous
import numpy as np

class gumbel_l_trunc_gen(rv_continuous):
    "truncated gumbel_l distribution"

    def __init__(self, name='gumbel_l_trunc'):
        self.gumbel_l_cdf_0 = gumbel_l.cdf(0)
        self.gumbel_trunc_normalize = 1 / self.gumbel_l_cdf_0
        super().__init__(name=name)

    def _pdf(self, x):
        return np.where(x <= 0, gumbel_l.pdf(x) * self.gumbel_trunc_normalize, 0)

    def _cdf(self, x):
        return np.where(x <= 0, gumbel_l.cdf(x) * self.gumbel_trunc_normalize, 1)

    def _ppf(self, x):
        return gumbel_l.ppf(x * self.gumbel_l_cdf_0)

gumbel_l_trunc = gumbel_l_trunc_gen()

data = gumbel_l_trunc.rvs(size=100000)

x = np.linspace(min(data), 1, 500)
plt.hist(data, bins=50, density=True)
plt.plot(x, gumbel_l_trunc.pdf(x))
plt.show()