I have a dataset similar to the following.
Case <- c("A", "B", "C", "A", "B", "C", "A", "B", "C")
ObservationNumber <- c(1, 1, 1, 2, 2, 2, 3, 3, 3)
ObservedValue <- c(154, 152, 157, 173, 176, 171, 203, 205, 199)
And I am trying to create a new data frame where one column is the Case (A, B, or C) and the second column is the change in observed value from the first observation to the third observation. The number of rows would be the number of distinct cases (3 in the example above, but several hundred in the real dataset).
There are workarounds that could work for the example data, but my real data is several hundred cases with 50 observations each. As such, I need some way of doing this that does not require (1) creating a separate dataset for each case (e.g. by filtering) to perform the subtraction on, which is the only way I have been able to come up with on my own.
Does anybody have any other ideas for ways to do this?
**Also in the example the rows are in order, but in the real dataset they are not. As a result, I can’t just take the first and last row in each group; I need to take rows with the maximum and minimum values in each group. Both answers below address this well.
I like to use the dplyr
library for these kinds of grouping/summarising questions:
library(dplyr)
df %>%
arrange(ObservationNumber) %>%
group_by(Case) %>%
summarise(diff = last(ObservedValue) - first(ObservedValue))