Search code examples
pythonscipystatisticsastronomyprobability-density

Custom PDF from scipy.stats.rv_continuous unwanted upper-bound


I am attempting to generate a random probability density function of QSO's of certain luminosity with the form:

1/( (L/L_B^* )^alpha + (L/L_B^* )^beta )

where L_B^*, alpha, and beta are all constants. To do this, the following code is used:

import scipy.stats as st

logLbreak = 43.88
alpha = 3.4
beta = 1.6


class my_pdf(st.rv_continuous):

    def _pdf(self,l_L): 
        #"l_L" in this is always log L        
        L = 10**(l_L/logLbreak)
        D = 1/(L**alpha + L**beta)
        return D

dist_Log_L = my_pdf(momtype = 0, a = 0,name='l_L_dist')


distro = dist_Log_L.rvs(size = 10000)

(L/L^* is rased to a power of 10 since everything is being done in a log scale)

The distribution is supposed to produce a graph that approximates this, trailing off to infinity, but in reality the graph it produces looks like this (10,000 samples). The upper bound is the same regardless of the amount of samples that are used. Is there a reason it is being restricted in the way it is?


Solution

  • Your PDF is not properly normalized. The integral of a PDF over the domain must be 1. Your PDF integrates to approximately 3.4712:

    In [72]: from scipy.integrate import quad
    
    In [73]: quad(dist_Log_L._pdf, 0, 100)
    Out[73]: (3.4712183965415373, 2.0134487716044682e-11)
    
    In [74]: quad(dist_Log_L._pdf, 0, 800)
    Out[74]: (3.4712184965748905, 2.013626296581202e-11)
    
    In [75]: quad(dist_Log_L._pdf, 0, 1000)
    Out[75]: (3.47121849657489, 8.412130378805368e-10)
    

    This will break the class's implementation of inverse transform sampling. It will only generate samples from the domain up to where the integral of the PDF from 0 to x first reaches 1.0, which in your case is about 2.325

    In [81]: quad(dist_Log_L._pdf, 0, 2.325)
    Out[81]: (1.0000875374350238, 1.1103202107010366e-14)
    

    That is, in fact, what you see in your histogram.

    As a quick fix to verify the issue, I modified the return statement of the _pdf() method to:

            return D/3.47121849657489
    

    and ran your script again. (In a real fix, that value will be a function of the other parameters.) Then the commands

    In [85]: import matplotlib.pyplot as plt
    
    In [86]: plt.hist(distro, bins=31)
    

    generates this plot:

    plot