Search code examples
pythonnumpyscipydistribution

Converting np.random.logistic to scipy.stats.fisk


I cannot figure out how to convert np.random.logistic to scipy.stats.fisk, here's my code:

import numpy as np
import numpy.random as npr
import scipy.stats as ss
import matplotlib.pyplot as plt

SEED = 1337
SIZE = 1_000_000
Generator = npr.default_rng(seed=SEED)
PARAMS = {
    "loc": 0,
    "scale": 1
}
n = Generator.logistic(
    loc=PARAMS['loc'],
    scale=PARAMS['scale'],
    size=SIZE,
)
ns = np.exp(n)
s = ss.fisk(
    c=1/PARAMS['scale'],
    scale=np.exp(PARAMS['loc']),
).rvs(
    random_state=SEED,
    size=SIZE,
)

Reading from the scipy doc, Suppose n is a logistic random variable with location loc and scale scale. Then ns=np.exp(n) is a Fisk (log-logistic) random variable with scale=np.exp(l) and shape c=1/scale.

However, plotting n and ns gives completely different distributions. What am I doing wrong?

plt.figure(figsize=(14, 6))

plt.subplot(1, 2, 1)
plt.hist(n, bins=30, density=True, alpha=0.6, color='g', label="n")
plt.title('Logistic Distribution')
plt.xlabel('Value')
plt.ylabel('Frequency')
plt.legend()

plt.subplot(1, 2, 2)
plt.hist(ns, bins=30, density=True, alpha=0.6, color='b', label="ns")
plt.hist(s, bins=30, density=True, alpha=0.6, color='r', label="s")
plt.title('Log-Logistic Distribution')
plt.xlabel('Value')
plt.ylabel('Frequency')
plt.legend()

plt.show()

result of plotting snippet above


Solution

  • If you zoom in on the plots of the Fisk-distributed samples, you will see that they have the same distribution.

    plt.hist(s, range=[0, 25], bins=40, alpha=0.5)
    plt.hist(ns, range=[0, 25], bins=40, alpha=0.5)
    

    enter image description here

    If you fit a fisk distribution to each sample, you'll find that the parameters are as expected.

    ss.fisk.fit(s)
    # (0.9999917101109586, 1.0994749033764006e-06, 1.0004068082206112)
    ss.fisk.fit(ns)
    # (1.0011298825776729, 1.5284376494445832e-06, 0.9973623597025572)
    

    The distribution just has a long tail and therefore can produce some extreme values, which throws off the plots with default range.