Search code examples
pythondataframestatisticsstatsmodelsscipy.stats

How to perform Sidak's test (multi comp) following two-way anova on a dataframe?


I have a dataframe with the following columns: Time, Drug and mobility for a full 24 hour period.

A Snapshot of the dataframe

Time Drug Mobility
18   A    1.2 
19   A    1.3
20   A    1.3
21   A    1.2
18   B    3.2
19   B    3.2
20   B    3.3
21   B    3.3

I then perform a two-way anova to compare the effects of the drugs at each time point on mobility using this code:

mod = ols('Mobility~Time+Drug+Time*Drug', data = fdf).fit()
aov = anova_lm(mod, type=2) 

Then i want to do a multicomparison test (post-hoc), particulary sidak but not sure how to do it. Struggling to find any resources online to learn from.

I know i can use tukey's test but it compares everything and I'm also only interest in the drug effects at the same timepoints:

18+A - 18+B
19+A - 19+B
20+A - 20+B

Not:

18+A - 19+B
18+A - 20+B
20+A - 18+A

Any assistance will help tremendously


Solution

  • If you are only interested in the comparison within group, you already have them in your coefficients.

    Using an example dataset:

    import statsmodels.formula.api as sm
    import pandas as pd
    import numpy as np
    from statsmodels.stats.anova import anova_lm
    from statsmodels.stats.multitest import multipletests
    
    fdf = pd.DataFrame({'Time':np.random.choice([18,19,20],50),
    'Drug':np.random.choice(['A','B'],50), 
    'Mobility':np.random.uniform(0,1,50)})
    
    fdf['Time'] = pd.Categorical(fdf['Time'])
    
    mod = sm.ols('Mobility~Time+Drug+Time:Drug', data = fdf).fit()
    aov = anova_lm(mod, type=2)
    

    Results look like this:

    mod.summary()
    
                                OLS Regression Results                            
    ==============================================================================
    Dep. Variable:               Mobility   R-squared:                       0.083
    Model:                            OLS   Adj. R-squared:                 -0.021
    Method:                 Least Squares   F-statistic:                    0.7994
    Date:                Wed, 08 Dec 2021   Prob (F-statistic):              0.556
    Time:                        07:13:14   Log-Likelihood:                -4.4485
    No. Observations:                  50   AIC:                             20.90
    Df Residuals:                      44   BIC:                             32.37
    Df Model:                           5                                         
    Covariance Type:            nonrobust                                         
    ========================================================================================
                               coef    std err          t      P>|t|      [0.025      0.975]
    ----------------------------------------------------------------------------------------
    Intercept                0.4063      0.094      4.323      0.000       0.217       0.596
    Time[T.19]               0.1622      0.149      1.091      0.281      -0.137       0.462
    Time[T.20]              -0.0854      0.133     -0.643      0.524      -0.353       0.182
    Drug[T.B]                0.0046      0.149      0.031      0.975      -0.295       0.304
    Time[T.19]:Drug[T.B]    -0.1479      0.206     -0.717      0.477      -0.564       0.268
    Time[T.20]:Drug[T.B]     0.2049      0.199      1.028      0.310      -0.197       0.607
    

    In this case, Time=18 is the reference, so Drug[T.B] would be the effect of B wrt to A, at Time=18, which is the result for 18+B - 18+A, Time[T.19]:Drug[T.B] would be 19+B - 19+A and Time[T.20]:Drug[T.B] would be 20+B - 20+A .

    For multiple comparisons adjustment, you can simply pull out these results and calculate the corrected pvalues:

    res = pd.concat([mod.params,mod.pvalues],axis=1)
    res.columns=['coefficient','pvalues']
    res = res[res.index.str.contains('Drug')]
    res['corrected_p'] = multipletests(res['pvalues'],method="sidak")[1]
    res['comparison'] = ['18+B - 18+A','19+B - 19+A','20+B - 20+A']
    
                          coefficient   pvalues  corrected_p   comparison
    Drug[T.B]                0.004630  0.975284     0.999985  18+B - 18+A
    Time[T.19]:Drug[T.B]    -0.147928  0.477114     0.857038  19+B - 19+A
    Time[T.20]:Drug[T.B]     0.204925  0.309616     0.670942  20+B - 20+A
    

    See also help page for multipletests