Search code examples
rhierarchical-clustering

how to keep samples in valid clusters?


I have 72 sample in my datExprSTLMS as gene expression dataset and ran clustering on this data set based on below code:

new_hclust = hclust(dist(datExprSTLMS), method = "average")
Cutreecluster_Sample <- cutreeDynamic(dendro = new_hclust, minClusterSize = 5,
                                          method = "tree")

and then I got table as below:

table(Cutreecluster_Sample)
Cutreecluster_Sample
 0  1  2  3  4 
 1 24 22 18  7 

Now, sample in cluster by 0 is the outlier and I would like to remove it from my dataset. so I run below code for keeping all samples except the sample is in cluster 0

keepSamples = (Cutreecluster_Sample==!0)

but when I run table for keepsamples I see below result:

> table(keepSamples)
    keepSamples
    FALSE  TRUE 
     48    24 

As you see in keepSamples I have just 24 samples instead of 71 samples. I appreciate if anybody guides me in code level for solving my problem.


Solution

  • Change keepSamples = (Cutreecluster_Sample==!0) to keepSamples = (Cutreecluster_Sample!=0)

    Why? Evaluating your command from right to left: !0 is a logical negation of 0, which is equivalent to !FALSE in R. Thus !0 is equal to TRUE. You then check if Cutreecluster_Sample equals TRUE. TRUE coerced to numeric is 1 in R. Thus your check is actually TRUE iff samples are in cluster 1, not cluster 0.

    Try !0 == 1 and FALSE == 0.