I have data shaped like this:
set.seed(123456)
domain <- as.factor(rep(c('alpha', 'bravo', 'charlie', 'delta', 'echo',
'foxtrot', 'golf', 'hotel', 'india', 'juliet'),
each = 8))
group <- as.factor(rep(c('group 1', 'group 2', 'group 3', 'group 4', 'group 5',
'group 6', 'group 7', 'group 8'), 10))
freq <- signif(rnorm(80, mean = 1750, sd = 500), 1)
df <- data.frame(domain, group, freq)
df
domain group freq
1 alpha group 1 2000
2 alpha group 2 2000
3 alpha group 3 2000
4 alpha group 4 2000
5 alpha group 5 3000
6 alpha group 6 2000
7 alpha group 7 2000
8 alpha group 8 3000
9 bravo group 1 2000
10 bravo group 2 2000
11 bravo group 3 1000
12 bravo group 4 1000
13 bravo group 5 2000
14 bravo group 6 2000
15 bravo group 7 2000
16 bravo group 8 2000
17 charlie group 1 1000
18 charlie group 2 2000
19 charlie group 3 3000
20 charlie group 4 2000
21 charlie group 5 1000
22 charlie group 6 2000
...
I'm trying to subtract the freq value of group 1 from the value in group 5 for all 10 domains whilst retaining the original data frame. This code will be ran on multiple datasets and so needs to be automated and be easily reproducible across multiple users.
This is what I'm after, note changes to group 5 in each domain:
domain group freq
1 alpha group 1 2000
2 alpha group 2 2000
3 alpha group 3 2000
4 alpha group 4 2000
5 alpha group 5 **1000**
6 alpha group 6 2000
7 alpha group 7 2000
8 alpha group 8 3000
9 bravo group 1 2000
10 bravo group 2 2000
11 bravo group 3 1000
12 bravo group 4 1000
13 bravo group 5 **0**
14 bravo group 6 2000
15 bravo group 7 2000
16 bravo group 8 2000
17 charlie group 1 1000
18 charlie group 2 2000
19 charlie group 3 3000
20 charlie group 4 2000
21 charlie group 5 **0**
22 charlie group 6 2000
...
I've tried using group_by()
from dplyr in combination with ifelse()
statements or base R to do this to no avail. Similar questions on this site aim to subtract a value from all others in a group which is not what I'm after.
If anyone could assist with a (what I imagine is a fairly simple) dplyr command to get this I'd appreciate it.
This is my first question, so please let me know if there are any housekeeping rules I could follow in a better manner!
You should be able to simply use mutate
here with an ifelse
and little bit of subsetting and .by = domain
in the following way:
df %>%
mutate(diffvals = ifelse(!(group %in% "group 5"), freq,
freq[group == "group 5"] - freq[group == "group 1"]),
.by = domain)
Output - note I created a new variable (diffvals
) just for demonstration/verification purposes. You could overwrite the original variable per your desired output by changing mutate(diffvals = ...
to mutate(freq = ...)
domain group freq diffvals
1 alpha group 1 2000 2000
2 alpha group 2 2000 2000
3 alpha group 3 2000 2000
4 alpha group 4 2000 2000
5 alpha group 5 3000 1000
6 alpha group 6 2000 2000
7 alpha group 7 2000 2000
8 alpha group 8 3000 3000
9 bravo group 1 2000 2000
10 bravo group 2 2000 2000
11 bravo group 3 1000 1000
12 bravo group 4 1000 1000
13 bravo group 5 2000 0
14 bravo group 6 2000 2000
15 bravo group 7 2000 2000
16 bravo group 8 2000 2000
17 charlie group 1 1000 1000
18 charlie group 2 2000 2000
19 charlie group 3 3000 3000
20 charlie group 4 2000 2000
21 charlie group 5 1000 0
22 charlie group 6 2000 2000
...