Search code examples

Difference in means test on pandas's summary statistics?

I am looking to perform a difference in means test on the summary statistics of two DataFrames.

count  5000.000000
mean      0.635558
std       0.086109
min       0.492922
25%       0.577885
50%       0.639906
75%       0.688645
max       0.800767

count  5000.000000
mean      0.640954
std       0.084459
min       0.496823
25%       0.577373
50%       0.644122
75%       0.693863
max       0.798076

I am looking for some function I can call on these summary statistics to tell me if my difference in means is statistically significant.


  • If You observe two independent samples from the same or different population then perform t-test for independent samples.

    This is a two-sided test for the null hypothesis that two independent samples have equal average values.

    from scipy.stats import ttest_ind
    ttest_ind(df1['sd'], df2['sd'])

    Output will be t-statistic and the p-value.