Search code examples
rdistancemetricpheatmap

pheatmap default distance metric R


I need to make a heatmap with the function 'pheatmap', using UPGMA and 1-pearson correlation as distance metric. My professor claims this is the default distance metric, although in my case it uses 'Euclidian' as distance metric. Is euclidian and 1 - pearson correlation the same or is he wrong? If he's wrong how can I use the correct distance metric for my heatmap?

My input

ph=pheatmap(avgreltlog10, color = colorRampPalette(rev(brewer.pal(n = 7, 
name = "RdYlBu")))(100), 
kmeans_k = NA, breaks = NA, border_color = "grey60",
cellwidth = 10, cellheight=10, scale = "none", cluster_rows=TRUE,
clustering_method = "average", cutree_rows = 4, cutree_cols= 2,)

R output

$tree_row

Call:
hclust(d = d, method = method)

Cluster method   : average 
Distance         : euclidean 
Number of objects: 65 


$tree_col

Call:
hclust(d = d, method = method)

Cluster method   : average 
Distance         : euclidean 
Number of objects: 10 

Solution

  • You can check the default settings easily by typing the function name without () in your terminal

    >pheatmap
    

    If you do that you can see that euclidean is used as default:

    ... clustering_distance_rows = "euclidean", clustering_distance_cols = "euclidean", clustering_method = "complete", ...
    

    To use 1-pearson correlation, simply specify it as such:

    cluster_rows = TRUE,
    clustering_distance_rows = "correlation"
    

    It works because, once again, if you dig into the code you can see that it calls for cluster_mat, which does this:

    cluster_mat = function(mat, distance, method){
    ...
        if(distance[1] == "correlation"){
            d = as.dist(1 - cor(t(mat)))
        }
    ...
    

    More info in the official document. There are so many packages around that it's not uncommon to mix things up :)