Apologies if this is very basic, I'm quite new to R. I have a dataset including days, categories and values, like below.
Day | Category | Value |
---|---|---|
1 | A | 0.5 |
3 | A | 0.1 |
1 | B | 1 |
4 | B | 0.7 |
1 | C | 1.2 |
2 | C | 1 |
5 | C | 0.9 |
I want to create a column of the "difference", i.e. how much the value has decreased from day 1 to day N for each category (it will have as many rows as I have categories, so 3 rows in this example). For the categories with more than 2 dates, I want to just take the start date (day 1) and the end date, ignoring the dates in between. I suspect the mutate() function may be able to help me, but I am unsure how to direct it to take the start-end date range. Any suggestions would be appreciated.
I find square brackets to be useful in this instance to be able to select the specific Value
s within each Category
:
library(dplyr)
dat %>%
summarise(diff_value = Value[which.max(Day)] - Value[Day == 1], .by=Category)
## Category diff_value
##1 A -0.4
##2 B -0.3
##3 C -0.3
Using dat
from @Stefan's answer.