Search code examples
pythonpandast-test

Pair-wise testing statistical significance on pandas data frame


I have a pandas dataframe (100x10), where each column represents some quantity and I would like to pair-wise test all columns using t-test. Instead of looping over the columns:

stats.ttest_rel(df.iloc[:,i], df.iloc[:,j])

where i!=j, is there a cleaner way to do it? Something similar to correlations:

df.corr()

where it computes all pair-wise correlations.


Solution

  • No need to do a double for-loop yourself. You can use itertools.combinations

    results = pd.DataFrame(columns=df.columns, index=df.columns)
    for (label1, column1), (label2, column2) in itertools.combinations(df.items(), 2):
        results.loc[label1, label2] = results.loc[label2, label1] = stats.ttest_rel(column1, column2)