Search code examples
rfunctionfor-loopheatmappearson-correlation

How to create a heatmap function filled with the correlation between two variables from a data set?


I'm trying to create a function which calculates the correlation coefficient between 2 columns of data from the a data set that I have and repeats this for every combination of columns.

Then I want it to plot all of the coefficients in a heatmap.

This is an outline of the dataset and what I want to include in the heatmap.

How would I edit my function so that it cycles through the data set and is able to calculate the correlation coefficient between all of the columns and plot the value in the heatmap? I'm first trying to create an empty data frame with all 0s and then I want it to fill in all the values.

master <- read.table("~/Desktop/Heatmap Project/master.txt", sep = "\t", header = T, stringsAsFactors = F)

vector_a <- master$Median_A
vector_b <- master$Median_B

heatmap_prep <- function(vector_a,vector_b){
    dummy <- as.data.frame(matrix(0, ncol=length(vector_b), nrow=length(vector_a))
    for (i in 1:length(vector_a)){
        first_number <- vector_a[i]
        for(j in 1:length(vector_b)){
            second_number <- vector_b[j]
            result <- cor(vector_a,vector_b)
            dummy [i,j] <- result

        }
    }
    return(dummy)
}

heatmap_data_matrix <- as.matrix(heatmap_prep(vector_a,vector_b))

#Create heatmap:
library(stats)
library(gplots)
library(RColorBrewer)
heatmap(heatmap_data_matrix,Colv = NA, Rowv=NA, revC=T, scale='none', xlab= "B", ylab= "A", main = "Heatmap", col = rev(brewer.pal(11,"RdBu")))

Thank you so much!


Solution

  • The following code should provide a minimal working example from what you provided.

    df <- data.frame("A" = c(12,13,15),
                     "B" = c(15,34,15),
                     "C" = c(16,34,56),
                     "D" = c(455,55,45),
                     "E" = c(78,67,65),
                     "F" = c(67,67,56),
                     "G" = c(67,45,64),
                     "H" = c(56,54,56),
                     "I" = c(56,89,90))
    
    library(reshape2)
    melted_cor <- melt(cor(df))
    library(ggplot2)
    ggplot(data = melted_cor, aes(x=X1, y=X2, fill=value)) + 
      geom_tile()
    

    tile plot

    Here it explained in more detail: http://www.sthda.com/english/wiki/ggplot2-quick-correlation-matrix-heatmap-r-software-and-data-visualization