Search code examples
python-3.xpandasnumpystatisticsgamma-distribution

Python: Numpy Gamma Function Produces Wrong Mean Value For Scale Parameter


I'm trying to draw 1000 samples ( each of size 227 ) from numpy.random's gamma method, so each sample value should be i.i.d ( independent and identically distributed ). However, the mean value for the scale parameter is wrong.

My shape parameter ( alpha ) is 0.375 and my scale parameter ( lambda ) is 1.674

According to my textbook, here are the estimated value formulas for these two parameters:

alpha = ( xbar ^ 2 ) / ( sigma_hat ^ 2 )
lambda = ( xbar ) / ( sigma_hat ^ 2 )

I think I might be using Pandas .apply() method incorrectly or maybe my get_lambda_hat function is wrong.

# In[11]:

# Import libraries:

import pandas as pd
import numpy as np
from numpy.random import gamma # gamma function
import seaborn as sns # plotting library

# plot histograms immediately:
get_ipython().run_line_magic('matplotlib', 'inline')


# In[12]:


# Define functions

def get_samples_from_gamma_dist( num_of_samples, size_of_samples, alpha, lamb ):
    '''
    Returns table with ( num_of_samples ) rows and ( size_of_samples ) columns.
    Cells in the table are i.i.d sample values from numpy's gamma function
    with shape parameter ( alpha ) and scale parameter ( lamb ).
    '''
    return pd.DataFrame( 
            data = gamma( 
                    shape = alpha, 
                    scale = lamb, 
                    size = 
                        ( 
                            num_of_samples, 
                            size_of_samples 
                        )
                )
            )

# Returns alpha_hat of a sample:
get_alpha_hat = lambda sample : ( sample.mean()**2 ) / sample.var()

# Returns lambda_hat of a sample:
get_lambda_hat = lambda sample : sample.mean() / sample.var()


# In[13]:


# Retrieve samples

# Declaring variables...
my_num_of_samples = 1000
my_size_of_samples = 227
my_alpha = 0.375
my_lambda = 1.674

# Initializing table...
data = get_samples_from_gamma_dist( 
    num_of_samples= my_num_of_samples, 
    size_of_samples= my_size_of_samples, 
    alpha= my_alpha, 
    lamb= my_lambda 
)

# Getting estimated parameter values from each sample...
alpha_hats = data.apply( get_alpha_hat, axis = 1 ) # apply function across the table's columns
lambda_hats = data.apply( get_lambda_hat, axis = 1 ) # apply function across the table's columns


# In[14]:


# Plot histograms:

# Setting background of histograms to 'whitegrid'...
sns.set_style( style = 'whitegrid' )

# Plotting the sample distribution of alpha_hat...
sns.distplot( alpha_hats, 
             hist = True, 
             kde = True, 
             bins = 50, 
             axlabel = 'Estimates of Alpha',
             hist_kws=dict(edgecolor="k", linewidth=2),
             color = 'red' )


# In[15]:


# Plotting the sample distribution of lambda_hat...
sns.distplot( lambda_hats, 
             hist = True, 
             kde = True, 
             bins = 50, 
             axlabel = 'Estimates of Lambda',
             hist_kws=dict(edgecolor="k", linewidth=2),
             color = 'purple' )


# In[16]:


# Print results:

print( "Mean of alpha_hats =", alpha_hats.mean(), '\n'  )

print( "Mean of lambda_hats =", lambda_hats.mean(), '\n' ) # about 0.62

print( "Standard Error of alpha_hats =", alpha_hats.std( ddof = 0 ), '\n'  )

print( "Standard Error of lambda_hats =", lambda_hats.std( ddof = 0 ), '\n'  )

After I plot the histograms of the estimated values of alpha and lambda respectively, I notice that alpha sample distribution is centered near perfectly on 0.375 but lambda's sample distribution is centered around 0.62 which is way off from 1.674. I've tried playing with other values for lambda but it never seems to center correctly.

I would love to know if anyone has any suggestions to fix this problem. I've included all the code from the .py file downloaded from my jupyter notebook session.


Solution

  • Fixed. The probability mass function of the gamma function is implemented differently in numpy.random than in my textbook.

    I got the correct mean value by setting the 'scale' parameter within get_samples_from_gamma_dist()'s body to 1 / lamb:

    def get_samples_from_gamma_dist( num_of_samples, size_of_samples, alpha, lamb ):
    '''
    Returns table with ( num_of_samples ) rows and ( size_of_samples ) columns.
    Cells in the table are i.i.d sample values from numpy's gamma function
    with shape parameter ( alpha ) and scale parameter ( 1 / lamb ).
    '''
    return pd.DataFrame( 
            data = gamma( 
                    shape = alpha, 
                    scale = 1 / lamb, 
                    size = 
                        ( 
                            num_of_samples, 
                            size_of_samples 
                        )
                )
            )