In the following data frame, I want to calculate p-values for each protein comparing the 'control' replicates and the 'treated' replicates. I am very new to using R and I just want to see if I can shift away from using Excel for tasks like these. In reality, I'll have thousands of proteins. I'll then use p.adjust()
to correct for the multiple hypothesis testing.
I'd be very grateful for any advice.
Protein Control_1 Control_2 Control_3 Treated_1 Treated_2 Treated_3
1 1 7.15 7.16 7.11 6.91 6.88 6.92
2 2 6.64 6.61 6.59 6.37 6.35 6.41
3 3 3.68 3.78 3.81 2.40 2.09 2.17
4 4 5.04 5.01 4.69 3.43 3.52 3.66
5 5 6.92 6.81 6.90 7.12 7.21 7.27
Desired: -
Protein Control_1 Control_2 Control_3 Treated_1 Treated_2 Treated_3 P-value
1 1 7.15 7.16 7.11 6.91 6.88 6.92 0.000413
2 2 6.64 6.61 6.59 6.37 6.35 6.41 0.000742
3 3 3.68 3.78 3.81 2.40 2.09 2.17 0.001010
4 4 5.04 5.01 4.69 3.43 3.52 3.66 0.001262
5 5 6.92 6.81 6.90 7.12 7.21 7.27 0.004306
Updated with @StupidWolf's comment.
Since you are new to R I am providing an easy to understand and modify solution.
# Generate data that looks like yours
df <- data.frame(Protein=1:5,Control_1=rnorm(5,5),Control_2=rnorm(5,5),
Control_3=rnorm(5,5),Treated_1=rnorm(5,5),Treated_2=rnorm(5,5),
Treated_3=rnorm(5,5))
p_vals <- rep(NA,nrow(df))
for(i in 1:nrow(df)){
i.p_val <- t.test(df[i,grep("Control",colnames(df))],
df[i,grep("Treated",colnames(df))])$p.value
p_vals[i] <- i.p_val
}
df <- cbind(df,Pvalue=p_vals)
df
should give you
Protein Control_1 Control_2 Control_3 Treated_1 Treated_2 Treated_3 Pvalue
1 1 5.813581 5.149145 4.662203 5.481839 6.424654 5.503664 0.2621811
2 2 4.191440 6.155372 5.773128 3.941712 5.945056 4.182457 0.4769504
3 3 4.654504 4.598808 5.258675 4.101895 6.135411 4.276641 0.9993112
4 4 5.426672 4.520739 6.293757 3.787395 5.274740 3.847900 0.1909877
5 5 5.614929 6.993289 3.786346 5.193352 5.362928 4.746676 0.7353676
You can change it from t.test()
to other tests like non-parametric ones if you like.