Search code examples
pythonpython-2.7scipynormal-distributionkolmogorov-smirnov

Non-standard distributions variables for KS testing?


Could you use the kstest in scipy.stats for the non-standard distribution functions (ie. vary the DOF for Students t, or vary gamma for Cauchy)? My end goal is to find the max p-value and corresponding parameter for my distribution fit but that isn't the issue.

EDIT:

"

scipy.stat's cauchy pdf is:

cauchy.pdf(x) = 1 / (pi * (1 + x**2))

where it implies x_0 = 0 for the location parameter and for gamma, Y = 1. I actually need it to look like this

cauchy.pdf(x, x_0, Y) = Y**2 / [(Y * pi) * ((x - x_0)**2 + Y**2)]

"

Q1) Could Students t, at least, could be used in a way perhaps like

stuff = []
for dof in xrange(0,100):
    d, p, dof = scipy.stats.kstest(data, "t", args = (dof, ))
    stuff.append(np.hstack((d, p, dof)))

since it seems to have the option to vary the parameter?

Q2) How would you do this if you needed the full normal distribution equation (need to vary sigma) and Cauchy as written above (need to vary gamma)? EDIT: Instead of searching scipy.stats for non-standard distributions, is it actually possible to feed a function I write into the kstest that will find p-value's?

Thanks kindly


Solution

  • It seems that what you really want to do is parameter estimation.Using the KT-test in this manner is not really what it is meant for. You should use the .fit method for the corresponding distribution.

    >>> import numpy as np, scipy.stats as stats
    >>> arr = stats.norm.rvs(loc=10, scale=3, size=10) # generate 10 random samples from a normal distribution
    >>> arr
    array([ 11.54239861,  15.76348509,  12.65427353,  13.32551871,
            10.5756376 ,   7.98128118,  14.39058752,  15.08548683,
             9.21976924,  13.1020294 ])
    >>> stats.norm.fit(arr)
    (12.364046769964004, 2.3998164726918607)
    >>> stats.cauchy.fit(arr)
    (12.921113834451496, 1.5012714431045815)
    

    Now to quickly check the documentation:

    >>> help(cauchy.fit)
    
    Help on method fit in module scipy.stats._distn_infrastructure:
    
    fit(data, *args, **kwds) method of scipy.stats._continuous_distns.cauchy_gen instance
        Return MLEs for shape, location, and scale parameters from data.
    
        MLE stands for Maximum Likelihood Estimate.  Starting estimates for
        the fit are given by input arguments; for any arguments not provided
        with starting estimates, ``self._fitstart(data)`` is called to generate
        such.
    
        One can hold some parameters fixed to specific values by passing in
        keyword arguments ``f0``, ``f1``, ..., ``fn`` (for shape parameters)
        and ``floc`` and ``fscale`` (for location and scale parameters,
        respectively).
    
    ...
    
    Returns
    -------
    shape, loc, scale : tuple of floats
        MLEs for any shape statistics, followed by those for location and
        scale.
    
    Notes
    -----
    This fit is computed by maximizing a log-likelihood function, with
    penalty applied for samples outside of range of the distribution. The
    returned answer is not guaranteed to be the globally optimal MLE, it
    may only be locally optimal, or the optimization may fail altogether.
    

    So, let's say I wanted to hold one of those parameters constant, you could easily do:

    >>> stats.cauchy.fit(arr, floc=10)
    (10, 2.4905786982353786)
    >>> stats.norm.fit(arr, floc=10)
    (10, 3.3686549590571668)