Search code examples
pythonmathstatisticscdfkurtosis

Python Skewness and Kurtosis in Naive Bayes classifier


I am creating a Naive Bayes classifier in Python that will be able to guess which month it is based on some weather data of a single day.

Currently the mean and standard deviation are used to classify the month, however I figured that adding skewness and kurtosis might help in improving the accuracy.

I am currently using scipy.stats.norm.cdf to calculate the chance, but I cannot seem to find any cdf function in Python that takes skewness and kurtosis into account.

I feel like I might not be understanding skewness and kurtosis correctly. Skewness and kurtosis have an impact on the cdf function and therefore I expected them to be given as a parameter.

Is there something fundamentally wrong with my understanding of skewness, kurtosis and the cdf function? If not, then where can I find an implementation of the cdf function in Python that takes all these parameters into account?


Solution

  • Normal distribution, which you use (scipy.stats.norm) and which is typicaly used to model one-dimensional conditional distribution in Naive Bayes is explicitly defined by just two parameters - its mean and std. There is no point in specifing skewness/kurtosis as they are constant for your distribution (in particular kurtosis is 3).

    What you are thinking about is probably a Pearson distribution, which is used to fit more moments (mean, std, skewness and kurtosis).

    http://docs.scipy.org/doc/scipy-0.15.1/reference/generated/scipy.stats.pearson3.html