I am trying to write a for loop to check if the relative abundances of observations based on grouped set of variables add up to 100. In the simplified example below, I want to check if all the relative abundance (RelAb) values associated with batch A1 add up to 100.
Batch | Reads | RelAb |
---|---|---|
A1 | 28431 | 72.94 |
A1 | 10549 | 27.06 |
B1 | 19315 | 85.96 |
B1 | 3155 | 14.04 |
If I were to check each batch one by one I would have to repeat the following code and change Batch to a different object each time.
test.batch <- data.batch %>%
dplyr::filter(Batch == "A1")
sum(test.batch$RelAbByBatch)
I was able to get values of 100 for each batch I checked manually, but I didn't want to repeat the same line of code again and again.
So I tried writing a for loop:
Batches <- c("A1", "A2", "A3", "A4", "B1", "B2", "B3", "B4", "B5", "B6", "B7")
for(i in Batches) {
filtered.batch <- data.batch %>%
dplyr::filter(Batch %in% Batches)
print(sum(filtered.batch$RelAb))
However, the loop worked but the results from each variable did not add up to 100:
[1] 1100
[1] 1100
[1] 1100
[1] 1100
[1] 1100
[1] 1100
[1] 1100
[1] 1100
[1] 1100
[1] 1100
[1] 1100
Incidentally, the length of the Batches vector was 11 but I'm not sure how/why the correct result of 100 multiplied itself by 11.
I also tried subsetting instead of dplyr::filter but got the same result as above.
for(i in Batches) {
filtered.batch <- data.batch[data.batch$Batch %in% Batches]
print(sum(filtered.batch$Batch))
}
I'm sure a very simple solution would solve this issue (which is not even urgent because repeating a line of code 11 times isn't the biggest problem), but I'm very curious how this could be fixed so I can write correct code in the future. Thanks!
library(tidyverse)
df <- read_table("Batch Reads RelAb
A1 28431 72.94
A1 10549 27.06
B1 19315 85.96
B1 3155 14.04")
df %>%
summarise(sum = sum(RelAb),
threshold = sum(RelAb) >= 100,
.by = Batch)
# A tibble: 2 x 3
Batch sum threshold
<chr> <dbl> <lgl>
1 A1 100 TRUE
2 B1 100 TRUE