Search code examples
rdataframedplyrrowsdifference

Difference between rows based on a condition on R


I searched a lot for this answer, but I couldn't find anything that would help me, so here it goes:

I need to change the value on a certain group of rows based on the value from another group. My dataframe is organized like this:

df <- data.frame(group = c("Apple Juice", "Apple Juice", "Apple Juice", "Bottle", "Bottle", "Bottle"), month = c(1,2,3,1,2,3), cost = c(10,10,10,4,4,4))
group month cost
Apple Juice 1 10
Apple Juice 2 10
Apple Juice 3 10
Bottle 1 4
Bottle 2 4
Bottle 3 4

In which the cost of the bottle is a part of the apple juice's cost, but I need to take this value off for the second part of the job, while keeping the same structure:

group month cost
Apple Juice 1 6
Apple Juice 2 6
Apple Juice 3 6
Bottle 1 4
Bottle 2 4
Bottle 3 4

Usually I do this with dplyr and it used to work:

df <- df %>%
  group_by(group , month) %>%
  dplyr::summarise(cost = -cost[group == 'Bottle'], group = 'Apple Juice') %>%
  ungroup()

But for some reason when I'm running this code I'm getting this result:

group month cost
Apple Juice 1 -4
Apple Juice 2 -4
Apple Juice 3 -4

How can I fix this?


Solution

  • We could only group by 'month' and get the difference

    library(dplyr)
    df %>%
       group_by(month) %>%
       mutate(cost = replace(cost, group != 'Bottle', diff(cost[2:1]))) %>% 
       ungroup
    

    -output

    # A tibble: 6 × 3
      group       month  cost
      <chr>       <dbl> <dbl>
    1 Apple Juice     1     6
    2 Apple Juice     2     6
    3 Apple Juice     3     6
    4 Bottle          1     4
    5 Bottle          2     4
    6 Bottle          3     4