I am looking to perform a difference in means test on the summary statistics of two DataFrames.
df1[['sd']].describe()
sd
count 5000.000000
mean 0.635558
std 0.086109
min 0.492922
25% 0.577885
50% 0.639906
75% 0.688645
max 0.800767
df2[['sd']].describe()
sd
count 5000.000000
mean 0.640954
std 0.084459
min 0.496823
25% 0.577373
50% 0.644122
75% 0.693863
max 0.798076
I am looking for some function I can call on these summary statistics to tell me if my difference in means is statistically significant.
If You observe two independent samples from the same or different population then perform t-test for independent samples.
This is a two-sided test for the null hypothesis that two independent samples have equal average values.
from scipy.stats import ttest_ind
ttest_ind(df1['sd'], df2['sd'])
Output will be t-statistic and the p-value.