Search code examples
rdataframerankingtabular

Isolating rows from a table that have columns ranked in a desired manner


I have a tab delineated table in which the last three columns contain statistical values, and I would like to retrieve only rows in which the columns are ranked in a desired manner. The rows I am interested in are those where the values in the columns are ranked as such A_C>A_B>B_C.

Here is an example of the table:

marker  chr A_B A_C B_C
rs1000073   1   0.097328991622858   0.101954778294364   0.0155614929271569
rs1000283   1   0.194891573233045   0.0612572864045251  0.0287416461802493
rs1000352   1   0.146693199204067   0.166583183464355   -0.00301950205401285
rs1000451   1   0.116693199204067   0.266583183464355   0.00401950205401285

So in this case, I would only want to retrieve the rs1000352 and rs1000073 rows (the actual table has more than a million rows in it, but you get the idea).

From there I will write the rows of interest to a new tab deliminated text file (I know how to do this part).

Does anyone have any suggestions on how to do this?


Solution

  • a data.table solution, syntax sugar!:

    DT <- data.table(dt)
    dt <- DT[(A_C>A_B) & (A_B>B_C)]
    

    You can even check the result visually:

    library(reshape2)
    dtl <- melt(dt)
    library(ggplot2)
    ggplot(subset(dtl,variable!='chr'))+
      geom_point(aes(marker,value,color=variable),size=5)
    ggplot(subset(dtl,variable!='chr'))+
      geom_point(aes(marker,value,color=reorder(variable,value)),size=5)
    

    enter image description here