Search code examples
pythonpandasnumpystatistical-test

Difference in means test on pandas's summary statistics?


I am looking to perform a difference in means test on the summary statistics of two DataFrames.

df1[['sd']].describe()
                sd
count  5000.000000
mean      0.635558
std       0.086109
min       0.492922
25%       0.577885
50%       0.639906
75%       0.688645
max       0.800767

df2[['sd']].describe()
                sd
count  5000.000000
mean      0.640954
std       0.084459
min       0.496823
25%       0.577373
50%       0.644122
75%       0.693863
max       0.798076

I am looking for some function I can call on these summary statistics to tell me if my difference in means is statistically significant.


Solution

  • If You observe two independent samples from the same or different population then perform t-test for independent samples.

    This is a two-sided test for the null hypothesis that two independent samples have equal average values.

    from scipy.stats import ttest_ind
    
    ttest_ind(df1['sd'], df2['sd'])
    
    

    Output will be t-statistic and the p-value.