Search code examples
rcutdendrogram

How to select all clusters, in a dendrogram, below a certain cutoff, in r


I have a distance matrix, 227 columns by 227 rows called

"X9_resid_matrix"

This matrix represents 227 proteins and tells me how different each one is from each other. Therefore at the top right corner there is a zero because 1,1 represents the same protein therefore they are identical or zero different. The numbers range from 0 - 9.

Then I used the following command to make a dendrogram:

"plot(hclust(as.dist(X9_resid_matrix)))"

The y axis goes ranges from zero to nine.

I would like the computer to tell me only the clusters that are below a certain cutoff.

Meaning if there are only 31 clusters that are zero distance, I would like a code that the computer will use to tell me what are those 31 clusters that are zero distance.

The command:

cutree(hc, h=0)

does not seem to do that when I run it.

For example: I have the following dendrogram: Click here to view

I would like a command that cuts the dendrogram at a height of 1, and only displays 1 and 2.


Solution

  • I think the function you are looking for is cut, which takes a dendrogram object x and slices it at height h to produce a two-element list. The first element, "upper" is the original tree after pruning, and the second element, "lower" is a list containing the trimmed branches (see ?dendrogram for more details).

    Each of the branches in the "lower" element are dendrograms in their own right, and can be plotted and manipulated as usual. Here's a small example:

    newick <- "((Human:0.01,Chimp:0.03):0.02,(((Whale:0.04,Cow:0.01):0.01,Pig:0.01):0.01,(Dog:0.01,Cat:0.01):0.01):0.04);"
    install.packages("phylogram")
    dendro <- phylogram::read.dendrogram(text = newick)
    plot(dendro)
    

    tree plot

    The dendrogram object can then be cross-sectioned with the cut command:

    obj <- cut(dendro, h = 0.09)
    obj
    

    This outputs a list that looks like this:

    $upper
    'dendrogram' with 2 branches and 2 members total, at height 0.1 
    
    $lower
    $lower[[1]]
    'dendrogram' with 2 branches and 2 members total, at height 0.08 
    
    $lower[[2]]
    'dendrogram' with 2 branches and 5 members total, at height 0.06 
    

    Notice that the second element obj$lower contains two subtrees that resulted from cutting the original tree at height 0.09. To extract and plot say the second of these subtrees just use standard list subsetting syntax:

    subtree <- obj$lower[[2]]
    plot(subtree)
    

    subtree plot

    To convert your hc object into a dendrogram you can use the as.dendrogram function.