Search code examples
rdataframedplyr

How do I generate a new dataset based on subtracting values from existing columns in R?


Apologies if this is very basic, I'm quite new to R. I have a dataset including days, categories and values, like below.

Day Category Value
1 A 0.5
3 A 0.1
1 B 1
4 B 0.7
1 C 1.2
2 C 1
5 C 0.9

I want to create a column of the "difference", i.e. how much the value has decreased from day 1 to day N for each category (it will have as many rows as I have categories, so 3 rows in this example). For the categories with more than 2 dates, I want to just take the start date (day 1) and the end date, ignoring the dates in between. I suspect the mutate() function may be able to help me, but I am unsure how to direct it to take the start-end date range. Any suggestions would be appreciated.


Solution

  • I find square brackets to be useful in this instance to be able to select the specific Values within each Category:

    library(dplyr)
    dat %>%
        summarise(diff_value = Value[which.max(Day)] - Value[Day == 1], .by=Category)
    
    ##  Category diff_value
    ##1        A       -0.4
    ##2        B       -0.3
    ##3        C       -0.3
    

    Using dat from @Stefan's answer.