Dear fellow Stackoverflow users,
I am a beginner in using R language for the purpose of analysing biological data and am facing a problem that I haven't been able to solve yet - maybe someone more experienced can help me out on this?
I have a large data frame which is a binary matrix. each row represents a different gene; each column a different condition in an experiment.
"1" in a cell indicates that gene is present in the given condition, "0" indicates the gene is not present.
How do I get a vector with rownames of the rows that contain a "1" only in a given column, but no other column (i.e., genes that are uniquely present in that condition?)
And how can I get a vector with rownames of the rows that contain "1" in a specified set of columns but "0" in all other columns (i.e., genes that are uniquely present in conditions /colums 1,2 and 5 for example?
I am looking forward to your suggestions!
Many thanks:-)
here is a possibility using the tidyverse
package.
Since you did not provide any data I created some dummy data which looks like this:
EDIT: I included rownames
> mydata
A B C D E
id_1 0 1 1 0 0
id_2 0 1 0 1 0
id_3 1 1 1 1 0
id_4 1 0 0 0 0
id_5 0 0 1 1 1
id_6 1 0 1 0 0
So I have six rows (named id_1 to id_6) with 5 columns named A to E.
Say I want to filter all rows where "B" and "D" are equal to 1 and the other columns are equal to zero. This can be done like this:
library(tidyverse)
mydata %>% as_tibble(rownames = "id") %>%
filter_at(vars(c("B", "D")), all_vars(. == 1)) %>%
filter_at(vars(-c("B", "D", "id")), all_vars(. == 0))
# A tibble: 1 x 6
id A B C D E
<chr> <int> <int> <int> <int> <int>
1 id_2 0 1 0 1 0