Search code examples
rdataframedplyrtidyversetibble

dplyr & tibble - conditional sum of two rows based on column value


Given a tibble looking as the one below, I'm trying to use the Tidyverse to perform a conditional sum based on the value of Item in each of the two Fields. Specifically, both for foo and bar, I'd like to sum the value corresponding to item a to that of item b and then remove the row of the former. The result I'm looking for is exemplified in table 2.

data <- tibble(Field = rep(c("foo", "bar"), each=4),
               Item = rep(c("a", "b", "c", "d"), 2),
               Value = runif(8))
# table 1                          # table 2
| Field | Item |   Value   |       | Field | Item |   Value   |
|-------|------|-----------|       |-------|------|-----------|
|  foo  |  a   | 0.8167347 |       |  foo  |  b   | 0.9583989 | <== 0.8167347 + 0.1416642
|  foo  |  b   | 0.1416642 |       |  foo  |  c   | 0.7054814 |
|  foo  |  c   | 0.7054814 |       |  foo  |  d   | 0.1196948 |
|  foo  |  d   | 0.1196948 |       |--------------------------|
|--------------------------|       |  bar  |  b   | 0.6177568 | <== 0.3604500 + 0.2573068
|  bar  |  a   | 0.3604500 |       |  bar  |  c   | 0.7003040 |
|  bar  |  b   | 0.2573068 |       |  bar  |  d   | 0.8131556 |
|  bar  |  c   | 0.7003040 |
|  bar  |  d   | 0.8131556 |

So far I didn't manage to get anywhere near the expected result. I know how to use dplyr's grouping function to isolate items belonging to either one of the two fields, but I have no idea how to select the value of a and sum it to b after the grouping has been performed.


Solution

  • You can change Item, to receive b when it is equal a, then summarise

    library(dplyr)
    

    Data

    set.seed(123)
    data <- tibble(Field = rep(c("foo", "bar"), each=4),
                   Item = rep(c("a", "b", "c", "d"), 2),
                   Value = runif(8))
    
    # A tibble: 8 x 3
    Field Item   Value
    <chr> <chr>  <dbl>
    1 foo   a     0.288 
    2 foo   b     0.788 
    3 foo   c     0.409 
    4 foo   d     0.883 
    5 bar   a     0.940 
    6 bar   b     0.0456
    7 bar   c     0.528 
    8 bar   d     0.892
    

    Result

    data %>% 
      mutate(Item = if_else(Item == "a","b",Item)) %>% 
      group_by(Field,Item) %>% 
      summarise(Value = sum(Value,na.rm = TRUE)) %>% 
      ungroup()
    
    # A tibble: 6 x 3
    Field Item  Value
    <chr> <chr> <dbl>
    1 bar   b     0.986
    2 bar   c     0.528
    3 bar   d     0.892
    4 foo   b     1.08 
    5 foo   c     0.409
    6 foo   d     0.883