Search code examples
rdendrogramdendextend

Baker's Gamma Distribution under H0 and FM index (dendextend)


I have some questions regarding the Baker's Gamma and FM indices in the dendextend package.

  1. What is the interpretation of the Baker's Gamma distribution under H0? i.e. when do you reject the null hypothesis?
  2. What is the difference between cor_FM_index and FM_index? The expectation and variance seems to stay the same but not the index value.
  3. The Bk plot shows the FM index over different values of k. What can be concluded from such a plot?

Solution

    1. The H0 is that there is no correlation between how "high" two items merge in one dend vs that value in the other dend. If two dends are equal then for each two leaves you will look at, the height of the branch in which they merge will be identical, so their baker's gamma (the correlation over all such pairs) will be 1. If the two trees are completely dissimilar, then their correlation will be close to 0. Something significant in between means that there is some type of similarity. Generally, that the more two leaves are "close" in one dend, so will they be close in the other. As with any correlation, the exact meaning in borderline cases cannot be inferred just by the cor value.

    2. cor_FM_index uses FM_index, but does so in the "correct" way. Look at the code of cor_FM_index to see how.

    3. It can show at which level of cutting the two trees they resemble each other. For example, if you had two trees (t1 and t2), each with two sub-families that includes the exact same items, then their Bk (k=2) would be 1. But it could be that when you cut these trees with k=3, their subtrees would no longer include the exact same items in t1 and t2. Hence, it is a measure of tree similarity at different levels of cutting the trees. If the trees are identical, it should be Bk=1 all the way. If they are similar in some heights, these Bk values would be significant.

    I hope this helps, thanks for the good questions.