I have a microarray dataset on which I performed a limma
lmFit()
test. If you haven't heard of it before, it's a powerful linear model package that tests differential gene expressions for >20k genes. You can extract the slope and intercept from the model for each one of these genes.
My problem is: given a table of slope and intercept values, how do I match a plot (I don't mind either ggplot2
's geom_abline
, lattice
's panel.abline
, or an alternative if necessary) with its corresponding slope and intercept?
My table (call it "slopeInt") has intercept as column 1 and slope as column 2, and has row names that correspond to the name of the gene. Their names look like this:
"202586_at" "202769_at" "203201_at" "214970_s_at" "219155_at"
These names match my gene names in another table ("Data") containing some details about my samples (I have 24 samples with different IDs and Time/Treatment combination) and the gene expression values.
It's in the long format with the gene names (as above) repeating every 24 rows (different expression levels for same gene, for each one of my samples):
ID Time Treatment Gene_name Gene_exp
... ... ... ... ...
I have overall eight genes I'm interested to plot, and the names in my Data$Gene_name
match the row names of my slopeInt
table. I can also merge the two tables together, that's not a problem. But I tried the following two approaches to give me graphs with graphs for every one of my genes with the appropriate regression, to no avail:
Using ggplot2
:
ggplot(Data, aes(x = Time, y = Gene_exp, group = Time, color = Treatment)) +
facet_wrap(~ Gene_name, scales = "free_x") +
geom_point() +
geom_abline(intercept = Intercept, slope = Time), data = slopeInt) +
theme(panel.grid.major.y = element_blank())`
And also using Lattice
:
xyplot(Gene_exp ~ Time| Gene_name, Data,
jitter.data = T,
panel = function(...){
panel.xyplot(...)
panel.abline(a = slopeInt[,1], b = slopeInt[,2])},
layout = c(4, 2))
I've tried multiple other methods in the actual geom_abline()
and panel.abline()
arguments, including some for loops, but I am not experienced in R and I cannot get it to work.. I can also have the data file in a wide format (separate columns for each gene).
Any help and further directions will be greatly appreciated!!!
Here is some code for a reproducible example:
Data <- data.frame(
ID = rep(1:24, 8),
Time = (rep(rep(c(1, 2, 4, 24), each = 3), 8)),
Treatment = rep(rep(c("control", "smoking"), each = 12), 8),
Gene_name = rep(c("202586_at", "202769_at", "203201_at", "214970_s_at",
"219155_at", "220165_at", "224483_s_at", "227559_at"), each = 24),
Gene_exp = rnorm(192))
slopeInt <- data.frame(
Intercept = rnorm(8),
Slope = rnorm(8))
row.names(slopeInt) <- c("202586_at", "202769_at", "203201_at",
"214970_s_at", "219155_at", "220165_at", "224483_s_at", "227559_at")
With lattice, this should work
xyplot(Gene_exp ~ Time| Gene_name, Data, slopeInt=slopeInt,
jitter.data = T,
panel = function(..., slopeInt){
panel.xyplot(...)
grp <- trellis.last.object()$condlevels[[1]][which.packet()]
panel.abline(a = slopeInt[grp,1], b = slopeInt[grp,2])
},
layout = c(4, 2)
)
using set.seed(15)
before generating the sample data results in the following plot
The "trick" here is to use trellis.last.object()$condlevels
to determine which conditioning block we are currently in. Then we use that information to extract the right slope information from the additional data we now pass in via a parameter. I thought there was a more elegant way to determine the current values of the conditioning variables but if there is I cannot remember it at this time.