I have the following expression data from test and control, each contain two samples.
> test <- read.table("http://dpaste.com/1059889/plain/")
> control <-read.table("http://dpaste.com/1059891/plain/")
# in reality there are more genes in each file.
> test
V1 V2 V3
1 Gene_1 3694.11963 3591.95732
2 Gene_2 791.57558 753.04814
3 Gene_3 2751.34223 2562.46166
4 Gene_4 3068.07188 2651.62906
5 Gene_5 295.00476 247.78944
6 Gene_6 2737.22068 2824.85853
7 Gene_7 1274.54016 1196.54412
8 Gene_8 7011.31102 6703.59308
9 Gene_9 991.71137 1170.66072
10 Gene_10 67.83878 81.69033
11 Gene_11 139.96068 141.97499
12 Gene_12 337.40039 354.96356
13 Gene_13 2861.67548 3132.97426
14 Gene_14 1264.63942 1547.56872
> control
V1 V2 V3
1 Gene_1 98.76904 219.95533
2 Gene_2 64.13716 152.69867
3 Gene_3 84.54906 194.95583
4 Gene_4 106.64893 220.18668
5 Gene_5 50.30000 40.20000
6 Gene_6 24.22860 56.13421
7 Gene_7 43.08251 63.50765
8 Gene_8 408.95196 589.50150
9 Gene_9 37.68644 58.33591
10 Gene_10 100.33000 430.00000
11 Gene_11 23.24041 20.00000
12 Gene_12 17.64007 21.34300
13 Gene_13 65.45922 74.02418
14 Gene_14 43.69905 89.19588
I want to compute P-value using , to see if they are differentially expressed, using ebayes.
Using standard t.test is straightforward, but I find it not useful for small samples.
t.test(c(test[2,]$V1,teste[2,]$V3),c(control[2,]$V1,control[2,]$V3))
What's the way to go about it? It's not clear from the help file.
There are two steps- first you create a matrix that is 4x14- the 4 replicates by the 14 (in your case) genes. Also create an experiment design: a data frame with one column, c(1, 1, 0, 0)
, indicating the first two columns of the matrix are test and the second two are control).
library(limma)
m = as.matrix(cbind(test[, 2:3], control[, 2:3]))
rownames(m) = test[, 1]
d = data.frame(condition=c(1, 1, 0, 0))
Then you use lmFit
to fit the linear model, and eBayes
to adjust it and calculate p-values:
fit = lmFit(m, d)
e = eBayes(fit)
The e
object, which is of the class MArrayLM
, contains the p-values and estimated coefficients, along with some other information:
print(e$coefficients)
condition
# Gene_1 3643.03848
# Gene_2 772.31186
# ...
print(e$p.value)
# condition
# Gene_1 6.299507e-07
# Gene_2 1.333970e-04
names(e)
# [1] "coefficients" "rank" "assign" "qr"
# [5] "df.residual" "sigma" "cov.coefficients" "stdev.unscaled"
# [9] "pivot" "genes" "Amean" "method"
# [13] "design" "df.prior" "s2.prior" "var.prior"
# [17] "proportion" "s2.post" "t" "df.total"
# [21] "p.value" "lods" "F" "F.p.value"
Incidentally, while limma
is better for multiple reasons (it calculates other parameters, provides more options, and performs useful corrections), if you did want to perform a t.test
to get p-values for each gene in a matrix, you could do this:
pvals = apply(m, 1, function(r) t.test(r ~ d$condition)$p.value)