I have a proteomics data matrix. In the data matrix, I have detected a different number of peptides for each protein (detectable peptides numbers vary on the protein).
Q1. How can I plot correlation graphs for each protein to compare how its' peptides behave. i.e. For protein A, I have peptides a1-a3, I want to compare a1 vs a2, a1 vs a3, and a2 vs a3.
Sample data
structure(list(Protein = c("A", "A", "A", "A", "B", "C", "C", "D", "D", "D"), Peptide = c("a1", "a2", "a3", "a4", "b1", "c1", "c2", "d1", "d2", "d3"), Sample1 = c(0.275755732, 0.683048798, 1.244604878, 0.850270313, 0.492175199, 0.269651338, 0.393004954, 0.157966662, 1.681672581, 0.298308801), Sample2 = c(0.408992244, 0.172488244, 1.749247694, 0.358172308, 0.142129982, 0.158636283, 0.243500648, 0.095019037, 0.667928805, 0.572162278), Sample3 = c(0.112265765, 0.377174168, 2.430040623, 0.497873323, 0.141136584, 0.250330266, 0.249783164, 0.107188279, 0.173623439, 0.242298602), Sample4 = c(0.87688073, 0.841826338, 0.831376575, 0.985900966, 0.891632525, 1.016533723, 0.292048735, 0.776351689, 0.800070173, 1.161882923), Sample5 = c(1.034093889, 0.304305772, 0.616445765, 1.000820463, 1.03124071, 0.995897846, 0.289542364, 0.578721727, 0.672592766, 1.168944588), Sample6 = c(1.063124715, 0.623917522, 0.613196611, 0.990921045, 1.014340981, 0.965631141, 0.316793011, 1.02220535, 1.182063616, 1.41196421), Sample7 = c(1.335677026, 0.628621656, 0.411171453, 1.050563412, 1.290233552, 1.1603839, 0.445372411, 1.077192698, 0.726669337, 1.09453338), Sample8 = c(1.139360562, 0.404024829, 0.263714711, 0.899959209, 1.356913804, 1.246338203, 0.426568548, 1.104988267, 0.964924824, 1.083654341), Sample9 = c(1.38146599, 0.582817437, 0.783698738, 1.118948066, 1.010795866, 1.277086848, 0.434025911, 1.238871048, 1.201184368, 1.476478831), Sample10 = c(1.111486801, 0.60513273, 0.460680037, 1.385702246, 1.448873253, 1.364329784, 0.375032044, 1.382750002, 0.741842319, 1.035657705)), row.names = c(NA, -10L), class = c("tbl_df", "tbl", "data.frame"), spec = structure(list( cols = list(Protein = structure(list(), class = c("collector_character", "collector")), Peptide = structure(list(), class = c("collector_character", "collector")), Sample1 = structure(list(), class = c("collector_double", "collector")), Sample2 = structure(list(), class = c("collector_double", "collector")), Sample3 = structure(list(), class = c("collector_double", "collector")), Sample4 = structure(list(), class = c("collector_double", "collector")), Sample5 = structure(list(), class = c("collector_double", "collector")), Sample6 = structure(list(), class = c("collector_double", "collector")), Sample7 = structure(list(), class = c("collector_double", "collector")), Sample8 = structure(list(), class = c("collector_double", "collector")), Sample9 = structure(list(), class = c("collector_double", "collector")), Sample10 = structure(list(), class = c("collector_double", "collector"))), default = structure(list(), class = c("collector_guess", "collector"))), class = "col_spec"))
Hence peptide number varies for each protein, how can I compare each peptide and save the faceted graph into single plots, by this, I can select only required graphs.
"Hence peptide number varies for each protein, how can I compare each peptide and save the faceted graph into single plots, by this, I can select only required graphs." I'm not entirely sure what you actually want to plot. A correlation plot of which quantities? Select only which required graphs?
Anyway, perhaps the following will help.
library(GGally)
library(tidyverse)
df %>%
gather(Sample, Value, -Protein, -Peptide) %>%
spread(Peptide, Value) %>%
filter(Protein == "A") %>%
ggpairs(columns = 3:6)
Explanation: We reshape data such that we have Value
s for every Peptide
in columns; then we filter entries for Protein == "A"
and use GGally::ggpairs
to show pairwise correlation plots of Value
s for every Peptide
.
You have a lot of flexibility in customising the output plot of ggpairs
(e.g. add regression lines, remove panels, etc.); I recommend taking a look at the GGally GitHub project page and at Multiple regression lines in ggpairs.
If you want to show correlation plots only for certain Peptide
s, you could do the following
pep_of_interest <- c("a2", "a4")
df %>%
gather(Sample, Value, -Protein, -Peptide) %>%
spread(Peptide, Value) %>%
filter(Protein == "A") %>%
ggpairs(columns = match(pep_of_interest, colnames(.)))