Search code examples
rggplot2correlationfacet

How to plot correlation graphs with R^2 for a big datamatrix?


I have a proteomics data matrix. In the data matrix, I have detected a different number of peptides for each protein (detectable peptides numbers vary on the protein).

Q1. How can I plot correlation graphs for each protein to compare how its' peptides behave. i.e. For protein A, I have peptides a1-a3, I want to compare a1 vs a2, a1 vs a3, and a2 vs a3.

Sample data

structure(list(Protein = c("A", "A", "A", "A", "B", "C", "C", "D", "D", "D"), Peptide = c("a1", "a2", "a3", "a4", "b1", "c1", "c2", "d1", "d2", "d3"), Sample1 = c(0.275755732, 0.683048798, 1.244604878, 0.850270313, 0.492175199, 0.269651338, 0.393004954, 0.157966662, 1.681672581, 0.298308801), Sample2 = c(0.408992244, 0.172488244, 1.749247694, 0.358172308, 0.142129982, 0.158636283, 0.243500648, 0.095019037, 0.667928805, 0.572162278), Sample3 = c(0.112265765, 0.377174168, 2.430040623, 0.497873323, 0.141136584, 0.250330266, 0.249783164, 0.107188279, 0.173623439, 0.242298602), Sample4 = c(0.87688073, 0.841826338, 0.831376575, 0.985900966, 0.891632525, 1.016533723, 0.292048735, 0.776351689, 0.800070173, 1.161882923), Sample5 = c(1.034093889, 0.304305772, 0.616445765, 1.000820463, 1.03124071, 0.995897846, 0.289542364, 0.578721727, 0.672592766, 1.168944588), Sample6 = c(1.063124715, 0.623917522, 0.613196611, 0.990921045, 1.014340981, 0.965631141, 0.316793011, 1.02220535, 1.182063616, 1.41196421), Sample7 = c(1.335677026, 0.628621656, 0.411171453, 1.050563412, 1.290233552, 1.1603839, 0.445372411, 1.077192698, 0.726669337, 1.09453338), Sample8 = c(1.139360562, 0.404024829, 0.263714711, 0.899959209, 1.356913804, 1.246338203, 0.426568548, 1.104988267, 0.964924824, 1.083654341), Sample9 = c(1.38146599, 0.582817437, 0.783698738, 1.118948066, 1.010795866, 1.277086848, 0.434025911, 1.238871048, 1.201184368, 1.476478831), Sample10 = c(1.111486801, 0.60513273, 0.460680037, 1.385702246, 1.448873253, 1.364329784, 0.375032044, 1.382750002, 0.741842319, 1.035657705)), row.names = c(NA, -10L), class = c("tbl_df", "tbl", "data.frame"), spec = structure(list( cols = list(Protein = structure(list(), class = c("collector_character", "collector")), Peptide = structure(list(), class = c("collector_character", "collector")), Sample1 = structure(list(), class = c("collector_double", "collector")), Sample2 = structure(list(), class = c("collector_double", "collector")), Sample3 = structure(list(), class = c("collector_double", "collector")), Sample4 = structure(list(), class = c("collector_double", "collector")), Sample5 = structure(list(), class = c("collector_double", "collector")), Sample6 = structure(list(), class = c("collector_double", "collector")), Sample7 = structure(list(), class = c("collector_double", "collector")), Sample8 = structure(list(), class = c("collector_double", "collector")), Sample9 = structure(list(), class = c("collector_double", "collector")), Sample10 = structure(list(), class = c("collector_double", "collector"))), default = structure(list(), class = c("collector_guess", "collector"))), class = "col_spec"))

Hence peptide number varies for each protein, how can I compare each peptide and save the faceted graph into single plots, by this, I can select only required graphs.


Solution

  • "Hence peptide number varies for each protein, how can I compare each peptide and save the faceted graph into single plots, by this, I can select only required graphs." I'm not entirely sure what you actually want to plot. A correlation plot of which quantities? Select only which required graphs?

    Anyway, perhaps the following will help.

    library(GGally)
    library(tidyverse)
    df %>%
        gather(Sample, Value, -Protein, -Peptide) %>%
        spread(Peptide, Value) %>%
        filter(Protein == "A") %>%
        ggpairs(columns = 3:6)
    

    enter image description here

    Explanation: We reshape data such that we have Values for every Peptide in columns; then we filter entries for Protein == "A" and use GGally::ggpairs to show pairwise correlation plots of Values for every Peptide.

    You have a lot of flexibility in customising the output plot of ggpairs (e.g. add regression lines, remove panels, etc.); I recommend taking a look at the GGally GitHub project page and at Multiple regression lines in ggpairs.


    Update

    If you want to show correlation plots only for certain Peptides, you could do the following

    pep_of_interest <- c("a2", "a4")
    df %>%
        gather(Sample, Value, -Protein, -Peptide) %>%
        spread(Peptide, Value) %>%
        filter(Protein == "A") %>%
        ggpairs(columns = match(pep_of_interest, colnames(.)))
    

    enter image description here