I have a tab delineated table in which the last three columns contain statistical values, and I would like to retrieve only rows in which the columns are ranked in a desired manner. The rows I am interested in are those where the values in the columns are ranked as such A_C>A_B>B_C.
Here is an example of the table:
marker chr A_B A_C B_C
rs1000073 1 0.097328991622858 0.101954778294364 0.0155614929271569
rs1000283 1 0.194891573233045 0.0612572864045251 0.0287416461802493
rs1000352 1 0.146693199204067 0.166583183464355 -0.00301950205401285
rs1000451 1 0.116693199204067 0.266583183464355 0.00401950205401285
So in this case, I would only want to retrieve the rs1000352 and rs1000073 rows (the actual table has more than a million rows in it, but you get the idea).
From there I will write the rows of interest to a new tab deliminated text file (I know how to do this part).
Does anyone have any suggestions on how to do this?
a data.table
solution, syntax sugar!:
DT <- data.table(dt)
dt <- DT[(A_C>A_B) & (A_B>B_C)]
You can even check the result visually:
library(reshape2)
dtl <- melt(dt)
library(ggplot2)
ggplot(subset(dtl,variable!='chr'))+
geom_point(aes(marker,value,color=variable),size=5)
ggplot(subset(dtl,variable!='chr'))+
geom_point(aes(marker,value,color=reorder(variable,value)),size=5)