I am doing kmeans clustering, and want to test the resultant clusters are statistically different. In 3 level clustering, I test cluster 0 with cluster 1 and then with cluster 2. Then I test cluster 2 with cluster 3. I tried to apply t-test clustering as shown in the following code. The clusters have different lengths as you know. I am confused about the logic? Should I use p>0.05 or p<0.05. Then where to put True and False?
def compare_2_groups(ar1,ar2):
s,p=ttest_ind(ar1,ar2)
#if p>0.05:
if p<0.05:
return False
else:
return True
This procedure should work, even if ar1 and ar2 have different lengths. The p value result indicates the strength of evidence AGAINST the null hypothesis that the two clusters have the same center, where smaller p indicates stronger evidence. Two suggestions:
If you choose a name with the opposite meaning "are_group_centers_different", reverse the logic of the threshold test, returning True if p < (threshold).