I am trying to perform Tukey's HSD test to see if there are significant differences in the mean's of values for several groups in my data. For example, here I am trying to see if there are mean differences in variable 'acad_se_communicate_needs' by groups 'Class'. However, I am encountering NaN values in my results. What is going on here, and how might I fix it?
I have used statsmodels functions to do this. I have avoided methods that require splitting data into different dataframes for each group, because I have to perform this analysis for several variables. Also, those methods are really difficult for me to understand.
from statsmodels.stats.multicomp import pairwise_tukeyhsd
from statsmodels.stats.multicomp import MultiComparison
mc = MultiComparison(clean['acad_se_communicate_needs'], clean['Class'])
result = mc.tukeyhsd()
print(result)
My output is as follows... nan's everywhere!
Multiple Comparison of Means - Tukey HSD,FWER=0.05
==============================================
group1 group2 meandiff lower upper reject
----------------------------------------------
Freshman Junior nan nan nan False
Freshman Senior nan nan nan False
Freshman Sophomore nan nan nan False
Junior Senior nan nan nan False
Junior Sophomore nan nan nan False
Senior Sophomore nan nan nan False
----------------------------------------------
There are nan values (missing). I tried some code to remove missing values. That code looks like
sm.stats.multicomp.pairwise_tukeyhsd('acad_se_communicate_needs','Class', alpha=0.05, missing = 'drop')
However, I get an error that says "pairwise_tukeyhsd() got an unexpected keyword argument 'missing'".
I ended up creating a new dataframe filtering the columns representing only the two variables, then dropped missing values. Then, I performed the Tukey's HSD test.
cleanTukey1 = clean.filter(items=['acad_se_communicate_needs', 'Class']).dropna()
from statsmodels.stats.multicomp import pairwise_tukeyhsd
from statsmodels.stats.multicomp import MultiComparison
mc1 = MultiComparison(cleanTukey1['acad_se_communicate_needs'], cleanTukey1['Class'])
result1 = mc1.tukeyhsd()
print(result1)
print(mc1.groupsunique)