Search code examples
rdata-transform

Data-transformation with R and dplyr


I have data in the following format:

Location Species Date Count
Location1 Species1 01-01-2024 2
Location1 Species1 01-02-2024 4
Location1 Species1 01-03-2024 3
Location1 Species2 01-01-2024 6
Location1 Species2 01-03-2024 3

And I want to transform it with r and dplyr into something like this:

Location Species First Date Last Date Date with Max Count Max Count
Location1 Species1 01-01-2024 01-03-2024 01-02-2024 4
Location1 Species2 01-01-2024 01-03-2024 01-01-2024 6

I have some complicated ideas, but maybe there is a very easy solution to do this?


Solution

  • The following should achieve your desired result and makes use of the the dplyr package:

      # Data
    mydata <- data.frame(
      stringsAsFactors = FALSE,
              Location = c("Location1","Location1",
                           "Location1","Location1","Location1"),
               Species = c("Species1","Species1",
                           "Species1","Species2","Species2"),
                  Date = c("01-01-2024","01-02-2024",
                           "01-03-2024","01-01-2024","01-03-2024"),
                 Count = c(2L, 4L, 3L, 6L, 3L)
    )
      # Code 
    library(dplyr)
    
    result <- mydata %>%
          group_by(Location, Species) %>%
          summarise(
            First_Date = min(Date),
            Last_Date = max(Date),
            Date_with_Max_Count = max(Date[which.max(Count)]),
            Max_Count = max(Count),
            .groups = 'drop'
          )
    

    If two dates have the same max count, I'm returning the most recent date in this code. If you want the first date, you'd replace max(Date[which.max(Count)]) with min(Date[which.max(Count)]). If you wanted something more complicated, you could simply write a function for that and replace the min/max functions with your custom function.