I have a data frame with many columns and rows, for example
ID Group A B C D E F G …
1 1 0.1 0.0 0.5 0.0 0.9 0.0 0.0 …
2 1 0.1 0.0 0.0 0.7 0.0 0.0 0.3 …
3 2 0.0 0.1 0.6 0.0 0.0 0.2 0.0 …
1 1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 …
3 1 0.0 0.5 0.0 0.0 0.0 0.6 0.0 …
1 2 0.2 0.1 0.2 0.0 0.0 0.0 0.1 …
2 2 0.1 0.1 0.1 0.1 0.1 0.1 0.1 …
1 2 0.0 0.2 0.8 0.0 0.0 1.3 1.1 …
I'd like to compare the rows with same ID across all column from A to the end. In the compared rows, if there is at least one value from group 1 is not zero, and there is at least one value from group 2 is not zero, then write this column name and value into a table. So the output table would look like this:
ID Group A B C D E F G …
1 1 0.1 NA 0.5 NA NA NA NA …
1 1 0.0 NA 0.0 NA NA NA NA …
1 2 0.2 NA 0.2 NA NA NA NA …
1 2 0.0 NA 0.8 NA NA NA NA …
2 1 0.1 NA NA 0.7 NA NA 0.3 …
2 2 0.1 NA NA 0.1 NA NA 0.1 …
3 2 NA 0.1 NA NA NA 0.2 NA …
3 1 NA 0.5 NA NA NA 0.6 NA …
As I am new to use R, this seems to be a big challenge for me. I would appreciate any help on my question. Thank you very much!
Here is one option using dplyr
, it's used your logic but in the opposite way I hope it solves your problem. For a given column, we replace column values for a given ID with NA when all values in group 1 or 2 are equal to zero.
library(dplyr)
df %>% arrange(ID) %>% # arrange ID in ascending order
group_by(ID) %>% # group by ID
#mutate at all columns except ID and Group
mutate_at(vars(-ID,-Group), ~replace(.,all(.[Group==1]==0)|all(.[Group==2]==0),NA))
# A tibble: 8 x 9
# Groups: ID [3]
ID Group A B C D E F G
<int> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 1 0.1 NA 0.5 NA NA NA NA
2 1 1 0 NA 0 NA NA NA NA
3 1 2 0.2 NA 0.2 NA NA NA NA
4 1 2 0 NA 0.8 NA NA NA NA
5 2 1 0.1 NA NA 0.7 NA NA 0.3
6 2 2 0.1 NA NA 0.1 NA NA 0.1
7 3 2 NA 0.1 NA NA NA 0.2 NA
8 3 1 NA 0.5 NA NA NA 0.6 NA
Data
df <- read.table(text ="
ID Group A B C D E F G
1 1 0.1 0.0 0.5 0.0 0.9 0.0 0.0
2 1 0.1 0.0 0.0 0.7 0.0 0.0 0.3
3 2 0.0 0.1 0.6 0.0 0.0 0.2 0.0
1 1 0.0 0.0 0.0 0.0 0.0 0.0 0.0
3 1 0.0 0.5 0.0 0.0 0.0 0.6 0.0
1 2 0.2 0.1 0.2 0.0 0.0 0.0 0.1
2 2 0.1 0.1 0.1 0.1 0.1 0.1 0.1
1 2 0.0 0.2 0.8 0.0 0.0 1.3 1.1",
header = TRUE, stringsAsFactors = FALSE)