Search code examples
pythonscipy

Python: Sampling using inverse cdf technique


I have a complicated (non standard) distribution function that I want to sample to generate simulated data points using the inverse cdf technique. For the sake of this example I will consider a Gaussian distribution

var=100
def f(x,a):
     def g(y):
       return (1/np.sqrt(2*np.pi*var))*np.exp(-y**2/(2*var))
   b,err=integrate.quad(g,-np.inf,x) 
   return b-a

I want to generate values between a=[0,1], a=np.linspace(0,1,10000,endpoint=False) and use scipy.optimize.fsolve to solve for x for each a.

I have two questions:

  1. How to use fsolve for an array of values a ?

  2. fsolve takes an initial guess x0, how to pick a good guess value?


Solution

  • Here's how you do it, I replaced 10000 with 10 as it's going to take a while. My initial guess is just 0, and I set it to the previous iteration for the next guess as it should be quite close to the solution. You can further bound this if you want so it's strictly above it.

    As a side comment this kind of sampling for complicated distributions isn't really feasible, as computing the cdf can be rather difficult. There are other sampling techniques to address these issues such as Gibbs sampling, Metropolis Hastings, etc.

    var = 100
    
    def f(x, a):
        def g(y):
            return (1/np.sqrt(2*np.pi*var))*np.exp(-y**2/(2*var))
    
        b, err = sp.integrate.quad(g, -np.inf, x) 
        return b - a
    
    
    a = np.linspace(0, 1, 10, endpoint=False)[1:]
    x0 = 0
    for a_ in a:
        xi = sp.optimize.fsolve(f, x0 + 0.01, args=(a_,))[0]
        print(xi)
        x0 = xi
    

    [EDIT] It seems to get stuck near 0, adding a small number fixes it, I'm not sure why as I don't know how fsolve works.