Search code examples
pythondesign-patternsglm

Hosmer-Lemeshow goodness of Fit test in Python


I have estimated a glm in python. How can I perform Hosmer-Lemeshow goodness

of fit test for this model in python?


Solution

  • I found a way, the code is not of the best quality however it works:

    import pandas as pd
    import numpy as np
    from scipy.stats import chi2
    pihat=model.predict()
    pihatcat=pd.cut(pihat, np.percentile(pihat,[0,25,50,75,100]),labels=False,include_lowest=True) #here I've chosen only 4 groups
    
    
    meanprobs =[0]*4 
    expevents =[0]*4
    obsevents =[0]*4 
    meanprobs2=[0]*4 
    expevents2=[0]*4
    obsevents2=[0]*4 
    
    for i in range(4):
       meanprobs[i]=np.mean(pihat[pihatcat==i])
       expevents[i]=np.sum(pihatcat==i)*np.array(meanprobs[i])
       obsevents[i]=np.sum(data.r[pihatcat==i])
       meanprobs2[i]=np.mean(1-pihat[pihatcat==i])
       expevents2[i]=np.sum(pihatcat==i)*np.array(meanprobs2[i])
       obsevents2[i]=np.sum(1-data.r[pihatcat==i]) 
    
    
    data1={'meanprobs':meanprobs,'meanprobs2':meanprobs2}
    data2={'expevents':expevents,'expevents2':expevents2}
    data3={'obsevents':obsevents,'obsevents2':obsevents2}
    m=pd.DataFrame(data1)
    e=pd.DataFrame(data2)
    o=pd.DataFrame(data3)
    
    tt=sum(sum((np.array(o)-np.array(e))**2/np.array(e))) #the statistic for the test, which follows,under the null hypothesis, the chi-squared distribution with degrees of freedom equal to amount of groups - 2 
    pvalue=1-chi2.cdf(tt,2)
    pvalue