Search code examples
pythonrstatisticslogistic-regressionmodeling

How can I code Vuong's statistical test in Python?


I need to implement Vuong's test for non-nested models. Specifically, I have logistic-regression models that I would like to compare.

I have found implementations in R and STATA online, but unfortunately I work in Python and am not familiar with those frameworks/languages. Also unfortunate is that I have not been unable to find a Python implementation of the test (if someone knows of one that would be fantastic!). I asked on Cross Validated and was redirected to here.

After scouring the original paper and a few other pages that I found (references below) I think I've managed to port the R implementation from the pscl. Example code is below.

Is there someone who is fluent in both Python and R who could help confirm that the below code reproduces the R implementation? And, if not, help determine where I erred? In addition to helping with my immediate problem, perhaps this can help someone in the future.

References consulted:

Example code with current implementation and use for logistic regression fits of dummy data. Note that this is a stock dataset and the predictor variables were arbitrarily selected.

import numpy as np
import pandas as pd
import statsmodels.api as sm
from scipy.stats import norm
from sklearn.datasets import load_breast_cancer

def vuong_test(mod1, mod2, correction=True):
    '''
    mod1, mod2 - non-nested logitstic regression fit results from statsmodels
    '''
    # number of observations and check of models
    N = mod1.nobs
    N2 = mod2.nobs
    if N != N2:
        raise ValueError('Models do not have the same number of observations')
    # extract the log-likelihood for individual points with the models
    m1 = mod1.model.loglikeobs(mod1.params)
    m2 = mod2.model.loglikeobs(mod2.params)
    # point-wise log likelihood ratio
    m = m1 - m2
    # calculate the LR statistic
    LR = np.sum(m)
    # calculate the AIC and BIC correction factors -> these go to zero when df is same between models
    AICcor = mod1.df_model - mod2.df_model
    BICcor = np.log(N)*AICcor/2
    # calculate the omega^2 term
    omega2 = np.var(m)
    # calculate the Z statistic with and without corrections
    Zs = np.array([LR,LR-AICcor,LR-BICcor])
    Zs /= np.sqrt(N*omega2)
    # calculate the p-value
    ps = []
    msgs = []
    for Z in Zs:
        if Z>0:
            ps.append(1 - norm.cdf(Z))
            msgs.append('model 1 preferred over model 2')
        else:
            ps.append(norm.cdf(Z))
            msgs.append('model 2 preferred over model 1')
    # share information
    print('=== Vuong Test Results ===')
    labs = ['Uncorrected']
    if AICcor!=0:
        labs += ['AIC Corrected','BIC Corrected']
    for lab,msg,p,Z in zip(labs,msgs,ps,Zs):
        print('  -> '+lab)
        print('    -> '+msg)
        print('    -> Z: '+str(Z))
        print('    -> p: '+str(p))

# load sample data
X,y = load_breast_cancer( return_X_y=True, as_frame=True)
# create data for modeling
X1 = sm.add_constant( X.loc[:,('mean radius','perimeter error','worst symmetry')])
X2 = sm.add_constant( X.loc[:,('mean area','worst smoothness')])
# fit the models
mod1 = sm.Logit( y, X1).fit()
mod2 = sm.Logit( y, X2).fit()

# run Vuong's Test
vuong_test( mod1, mod2)

Output

Optimization terminated successfully.
         Current function value: 0.172071
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.156130
         Iterations 9
=== Vuong Test Results ===
  -> Uncorrected
    -> model 2 preferred over model 1
    -> Z: -0.7859538767172318
    -> p: 0.21594725450542168
  -> AIC Corrected
    -> model 2 preferred over model 1
    -> Z: -0.8726045081599004
    -> p: 0.19143934123980472
  -> BIC Corrected
    -> model 2 preferred over model 1
    -> Z: -1.0608044994241501
    -> p: 0.14438937850119132

Solution

  • In order to move forward, I learned enough R to be able to make my own comparison between R and python for Vuong's Test as below.

    In the end, my original implementation was close except for the subtle point of difference between how numpy and R calculate standard deviations by default. After correcting to now calculate the sample standard deviation (by changing np.var(m) to np.var(m,ddof=1)), I get a match between pscl and my own python code.

    Updated Python implementation

    import numpy as np
    import pandas as pd
    import statsmodels.api as sm
    from scipy.stats import norm
    from sklearn.datasets import load_breast_cancer
    
    def vuong_test(mod1, mod2, correction=True):
        '''
        mod1, mod2 - non-nested logitstic regression fit results from statsmodels
        '''
        # number of observations and check of models
        N = mod1.nobs
        N2 = mod2.nobs
        if N != N2:
            raise ValueError('Models do not have the same number of observations')
        # extract the log-likelihood for individual points with the models
        m1 = mod1.model.loglikeobs(mod1.params)
        m2 = mod2.model.loglikeobs(mod2.params)
        # point-wise log likelihood ratio
        m = m1 - m2
        # calculate the LR statistic
        LR = np.sum(m)
        # calculate the AIC and BIC correction factors -> these go to zero when df is same between models
        AICcor = mod1.df_model - mod2.df_model
        BICcor = np.log(N)*AICcor/2
        # calculate the omega^2 term
        omega2 = np.var(m, ddof=1)
        # calculate the Z statistic with and without corrections
        Zs = np.array([LR,LR-AICcor,LR-BICcor])
        Zs /= np.sqrt(N*omega2)
        # calculate the p-value
        ps = []
        msgs = []
        for Z in Zs:
            if Z>0:
                ps.append(1 - norm.cdf(Z))
                msgs.append('model 1 preferred over model 2')
            else:
                ps.append(norm.cdf(Z))
                msgs.append('model 2 preferred over model 1')
        # share information
        print('=== Vuong Test Results ===')
        labs = ['Uncorrected']
        if AICcor!=0:
            labs += ['AIC Corrected','BIC Corrected']
        for lab,msg,p,Z in zip(labs,msgs,ps,Zs):
            print('  -> '+lab)
            print('    -> '+msg)
            print('    -> Z: '+str(Z))
            print('    -> p: '+str(p))
    
    # load sample data
    X,y = load_breast_cancer( return_X_y=True, as_frame=True)
    # create data for modeling
    X1 = sm.add_constant( X.loc[:,('mean radius','perimeter error','worst symmetry')])
    X2 = sm.add_constant( X.loc[:,('mean area','worst smoothness')])
    # fit the models
    mod1 = sm.Logit( y, X1).fit()
    mod2 = sm.Logit( y, X2).fit()
    
    # run Vuong's Test
    vuong_test( mod1, mod2)
    
    # save data for R test function
    pd.concat( [X,y], axis=1).to_csv('breast_cancer_data.csv')
    

    With output

    Optimization terminated successfully.
             Current function value: 0.172071
             Iterations 9
    Optimization terminated successfully.
             Current function value: 0.156130
             Iterations 9
    === Vuong Test Results ===
      -> Uncorrected
        -> model 2 preferred over model 1
        -> Z: -0.7852629281206245
        -> p: 0.21614971334597216
      -> AIC Corrected
        -> model 2 preferred over model 1
        -> Z: -0.8718373831692781
        -> p: 0.19164854872682913
      -> BIC Corrected
        -> model 2 preferred over model 1
        -> Z: -1.0598719238597758
        -> p: 0.1446014350740505
    

    Comparison R code

    # load library
    library(pscl)
    
    # load breast cancer data set
    bcdata <- read.csv('breast_cancer_data.csv')
    
    # build logistic regression models
    mod1 <- glm( target ~ mean.radius + perimeter.error + worst.symmetry, data=bcdata, family="binomial")
    mod2 <- glm( target ~ mean.area + worst.smoothness, data=bcdata, family="binomial")
    
    # compare the models
    vuong( mod1, mod2)
    

    With output

    Vuong Non-Nested Hypothesis Test-Statistic:
    (test-statistic is asymptotically distributed N(0,1) under the
     null that the models are indistinguishible)
    -------------------------------------------------------------
                  Vuong z-statistic             H_A p-value
    Raw                  -0.7852629 model2 > model1 0.21615
    AIC-corrected        -0.8718374 model2 > model1 0.19165
    BIC-corrected        -1.0598719 model2 > model1 0.14460
    Warning messages:
    1: glm.fit: fitted probabilities numerically 0 or 1 occurred
    2: glm.fit: fitted probabilities numerically 0 or 1 occurred