python numpy scipy distribution model-fitting

Fitting a Custom Scipy Distribution

I have redefined the lognormal distribution using custom scipy class. I have simulated this distribution and I am trying to recover the original parameters I have specified, however, the fit method is returning different parameters.

import numpy as np
import pandas as pd
from scipy.stats import rv_continuous
from scipy.special import erf
from scipy.special import erfinv

class lognorm_v2(rv_continuous):

    def _pdf(self, x, mu, sigma):
        return 1 / (x * sigma * np.sqrt(2 * np.pi)) * np.exp(-0.5 * ((np.log(x) - mu)/sigma)**2)

    def _cdf(self, x, mu, sigma):
        return 0.5 + 0.5 * erf((np.log(x) - mu)/ (np.sqrt(2)*sigma))
    
    def _sf(self, x, mu, sigma):
        u = (x)**b/(1+x**b)
        return 1 - 0.5 + 0.5 * erf((np.log(x) - mu)/ (np.sqrt(2)*sigma))
    
    def _ppf(self,x, mu, sigma):
        return np.exp(sigma * erfinv(2*x - 1) - mu)
    
    def _argcheck(self, mu, sigma):
        s = sigma > 0
        return s

np.random.seed(seed=111)
logn = lognorm_v2(name='lognorm_v2',a=0,b=np.inf)
test = logn.rvs(mu=2,sigma=1,loc=0,scale=1,size=100000)

logn.fit(test)
logn.fit(test,floc=0,fscale=1)

When loc and scale are not fixed I obtain the parameters:

(0.9216388162274325, 0.7061876689651909, -0.0003659266464081178, 0.05399544825451739)

When they are fixed the result is:

(-2.0007136838780917, 0.7086144279779958, 0, 1)

Why am I not able to extract the mu 2 and sigma 1 specified in the original simulation? I understand I will not get the exact values, but they should be very close for 100K simulations. My numpy is version 1.19.2 and scipy is 1.5.2. Thank you!

Solution

I've corrected code with proper _ppf, and it seems to produce proper fits for mu and sigma

Code, Python 3.9 Windows 10 x64

import numpy as np
from scipy.stats import rv_continuous
from scipy.special import erf
from scipy.special import erfinv

SQRT2 = np.float64(1.4142135623730951)

class lognorm_v2(rv_continuous):

    def _pdf(self, x, μ, σ):
        return 1 / (x * σ * SQRT2 * np.sqrt(np.pi)) * np.exp(-0.5 * ((np.log(x) - μ)/σ)**2)

    def _cdf(self, x, μ, σ):
        return 0.5 + 0.5 * erf((np.log(x) - μ)/ (SQRT2*σ))

    def _ppf(self, x, μ, σ):
        return np.exp(μ + σ * SQRT2 * erfinv(2.0*x - 1.0))

    def _argcheck(self, μ, σ):
        s = σ > 0.0
        return s

np.random.seed(seed=111)
logn = lognorm_v2(name='lognorm_v2', a=0.0, b=np.inf)
test = logn.rvs(μ=2.0,σ=1.0,loc=0.0,scale=1.0, size=100000)

logn.fit(test,floc=0,fscale=1)

prints out

(1.9990788106319746, 1.0021523463000124, 0, 1)