Search code examples
rdata-manipulationbinary-matrix

R - How to extract unique intersections between groups in a binary matrix?


Dear fellow Stackoverflow users,

I am a beginner in using R language for the purpose of analysing biological data and am facing a problem that I haven't been able to solve yet - maybe someone more experienced can help me out on this?

I have a large data frame which is a binary matrix. each row represents a different gene; each column a different condition in an experiment.

"1" in a cell indicates that gene is present in the given condition, "0" indicates the gene is not present.

How do I get a vector with rownames of the rows that contain a "1" only in a given column, but no other column (i.e., genes that are uniquely present in that condition?)

And how can I get a vector with rownames of the rows that contain "1" in a specified set of columns but "0" in all other columns (i.e., genes that are uniquely present in conditions /colums 1,2 and 5 for example?

I am looking forward to your suggestions!

Many thanks:-)


Solution

  • here is a possibility using the tidyverse package. Since you did not provide any data I created some dummy data which looks like this:

    EDIT: I included rownames

    > mydata
          A B C D E
    id_1 0 1 1 0 0
    id_2 0 1 0 1 0
    id_3 1 1 1 1 0
    id_4 1 0 0 0 0
    id_5 0 0 1 1 1
    id_6 1 0 1 0 0
    

    So I have six rows (named id_1 to id_6) with 5 columns named A to E.

    Say I want to filter all rows where "B" and "D" are equal to 1 and the other columns are equal to zero. This can be done like this:

    library(tidyverse)
    mydata %>% as_tibble(rownames = "id") %>% 
      filter_at(vars(c("B", "D")), all_vars(. == 1)) %>% 
      filter_at(vars(-c("B", "D", "id")), all_vars(. == 0))
    
    # A tibble: 1 x 6
      id        A     B     C     D     E
      <chr> <int> <int> <int> <int> <int>
    1 id_2     0     1     0     1     0