Search code examples
rdplyrrepeatsubtraction

Subtract values of specified subgroups from another within multiple larger groups


I have data shaped like this:

set.seed(123456)
domain <- as.factor(rep(c('alpha', 'bravo', 'charlie', 'delta', 'echo', 
                          'foxtrot', 'golf', 'hotel', 'india', 'juliet'), 
                        each = 8))
group <- as.factor(rep(c('group 1', 'group 2', 'group 3', 'group 4', 'group 5', 
                         'group 6', 'group 7', 'group 8'), 10))
freq <- signif(rnorm(80, mean = 1750, sd = 500), 1)
df <- data.frame(domain, group, freq)

df

    domain   group freq
1    alpha group 1 2000
2    alpha group 2 2000
3    alpha group 3 2000
4    alpha group 4 2000
5    alpha group 5 3000
6    alpha group 6 2000
7    alpha group 7 2000
8    alpha group 8 3000
9    bravo group 1 2000
10   bravo group 2 2000
11   bravo group 3 1000
12   bravo group 4 1000
13   bravo group 5 2000
14   bravo group 6 2000
15   bravo group 7 2000
16   bravo group 8 2000
17 charlie group 1 1000
18 charlie group 2 2000
19 charlie group 3 3000
20 charlie group 4 2000
21 charlie group 5 1000
22 charlie group 6 2000
...

I'm trying to subtract the freq value of group 1 from the value in group 5 for all 10 domains whilst retaining the original data frame. This code will be ran on multiple datasets and so needs to be automated and be easily reproducible across multiple users.

This is what I'm after, note changes to group 5 in each domain:

    domain   group freq
1    alpha group 1 2000
2    alpha group 2 2000
3    alpha group 3 2000
4    alpha group 4 2000
5    alpha group 5 **1000**
6    alpha group 6 2000
7    alpha group 7 2000
8    alpha group 8 3000
9    bravo group 1 2000
10   bravo group 2 2000
11   bravo group 3 1000
12   bravo group 4 1000
13   bravo group 5 **0**
14   bravo group 6 2000
15   bravo group 7 2000
16   bravo group 8 2000
17 charlie group 1 1000
18 charlie group 2 2000
19 charlie group 3 3000
20 charlie group 4 2000
21 charlie group 5 **0**
22 charlie group 6 2000
...

I've tried using group_by() from dplyr in combination with ifelse() statements or base R to do this to no avail. Similar questions on this site aim to subtract a value from all others in a group which is not what I'm after.

If anyone could assist with a (what I imagine is a fairly simple) dplyr command to get this I'd appreciate it.

This is my first question, so please let me know if there are any housekeeping rules I could follow in a better manner!


Solution

  • You should be able to simply use mutate here with an ifelse and little bit of subsetting and .by = domain in the following way:

    df %>%
      mutate(diffvals = ifelse(!(group %in% "group 5"), freq,
                                  freq[group == "group 5"] - freq[group == "group 1"]), 
                .by = domain)
    

    Output - note I created a new variable (diffvals) just for demonstration/verification purposes. You could overwrite the original variable per your desired output by changing mutate(diffvals = ... to mutate(freq = ...)

        domain   group freq diffvals
    1    alpha group 1 2000     2000
    2    alpha group 2 2000     2000
    3    alpha group 3 2000     2000
    4    alpha group 4 2000     2000
    5    alpha group 5 3000     1000
    6    alpha group 6 2000     2000
    7    alpha group 7 2000     2000
    8    alpha group 8 3000     3000
    9    bravo group 1 2000     2000
    10   bravo group 2 2000     2000
    11   bravo group 3 1000     1000
    12   bravo group 4 1000     1000
    13   bravo group 5 2000        0
    14   bravo group 6 2000     2000
    15   bravo group 7 2000     2000
    16   bravo group 8 2000     2000
    17 charlie group 1 1000     1000
    18 charlie group 2 2000     2000
    19 charlie group 3 3000     3000
    20 charlie group 4 2000     2000
    21 charlie group 5 1000        0
    22 charlie group 6 2000     2000
    ...