I have a dataframe looking like
Sample | Value | Domain |
---|---|---|
S1 | 12 | Domain_Identified_X13_A |
S1 | 25 | Domain_Identified_X28_B |
S1 | 18 | Domain_Unidentified |
I want to aggregate values for the rows including string of "Identified" to have a final df
Sample | Value | Domain |
---|---|---|
S1 | 37 | Domain_Identified |
S1 | 18 | Domain_Unidentified |
Thank you
Using summarize
from dplyr
and sub
to get the groups.
library(dplyr)
df %>%
group_by(Sample, Domain = sub("(.*dentified).*", "\\1", Domain)) %>%
summarize(Value = sum(Value), .groups="drop")
# A tibble: 2 × 3
Sample Domain Value
<chr> <chr> <int>
1 S1 Domain_Identified 37
2 S1 Domain_Unidentified 18
Alternative a base R approach using aggregate
setNames(aggregate(df$Value, by=
list(df$Sample, sub("(.*dentified).*", "\\1", df$Domain)), \(x)
sum(x)), c(colnames(df)[c(1,3)], "Value"))
Sample Domain Value
1 S1 Domain_Identified 37
2 S1 Domain_Unidentified 18