I have data in the following format:
Location | Species | Date | Count |
---|---|---|---|
Location1 | Species1 | 01-01-2024 | 2 |
Location1 | Species1 | 01-02-2024 | 4 |
Location1 | Species1 | 01-03-2024 | 3 |
Location1 | Species2 | 01-01-2024 | 6 |
Location1 | Species2 | 01-03-2024 | 3 |
And I want to transform it with r
and dplyr
into something like this:
Location | Species | First Date | Last Date | Date with Max Count | Max Count |
---|---|---|---|---|---|
Location1 | Species1 | 01-01-2024 | 01-03-2024 | 01-02-2024 | 4 |
Location1 | Species2 | 01-01-2024 | 01-03-2024 | 01-01-2024 | 6 |
I have some complicated ideas, but maybe there is a very easy solution to do this?
The following should achieve your desired result and makes use of the the dplyr
package:
# Data
mydata <- data.frame(
stringsAsFactors = FALSE,
Location = c("Location1","Location1",
"Location1","Location1","Location1"),
Species = c("Species1","Species1",
"Species1","Species2","Species2"),
Date = c("01-01-2024","01-02-2024",
"01-03-2024","01-01-2024","01-03-2024"),
Count = c(2L, 4L, 3L, 6L, 3L)
)
# Code
library(dplyr)
result <- mydata %>%
group_by(Location, Species) %>%
summarise(
First_Date = min(Date),
Last_Date = max(Date),
Date_with_Max_Count = max(Date[which.max(Count)]),
Max_Count = max(Count),
.groups = 'drop'
)
If two dates have the same max count, I'm returning the most recent date in this code. If you want the first date, you'd replace max(Date[which.max(Count)])
with min(Date[which.max(Count)])
. If you wanted something more complicated, you could simply write a function for that and replace the min/max functions with your custom function.