Search code examples
rpurrrtidy

Using Purr and Map to extract information from grouped data


I have a data set that I need to gather grouped data, such as the minimum time, maximum time, etc.

> data
# A tibble: 9 x 3
  DateTime            Location Temperature
  <dttm>              <chr>          <dbl>
1 2022-01-30 18:00:00 A               122 
2 2022-01-30 18:00:00 B               123 
3 2022-01-30 18:00:20 C               112 
4 2022-01-30 18:01:00 A               123 
5 2022-01-30 18:01:00 B               124 
6 2022-01-30 18:01:20 C               114 
7 2022-01-30 18:02:00 A               122.
8 2022-01-30 18:02:00 B               123 
9 2022-01-30 18:02:20 C               115 

I would like to have to a summary something like

Location   Min                       Max
A          2022-01-30 18:00:00       2022-01-30 18:02:00
B          2022-01-30 18:00:00       2022-01-30 18:02:00
C          2022-01-30 18:00:20       2022-01-30 18:00:20

I was able to split it into grouped tibbles using the following:

> data_grouped <- data %>%
+   split(.$Location)
> data
# A tibble: 9 x 3
  DateTime            Location Temperature
  <dttm>              <chr>          <dbl>
1 2022-01-30 18:00:00 A               122 
2 2022-01-30 18:00:00 B               123 
3 2022-01-30 18:00:20 C               112 
4 2022-01-30 18:01:00 A               123 
5 2022-01-30 18:01:00 B               124 
6 2022-01-30 18:01:20 C               114 
7 2022-01-30 18:02:00 A               122.
8 2022-01-30 18:02:00 B               123 
9 2022-01-30 18:02:20 C               115 
> data_grouped <- data %>%
+   split(.$Location)
> data_grouped
$A
# A tibble: 3 x 3
  DateTime            Location Temperature
  <dttm>              <chr>          <dbl>
1 2022-01-30 18:00:00 A               122 
2 2022-01-30 18:01:00 A               123 
3 2022-01-30 18:02:00 A               122.

$B
# A tibble: 3 x 3
  DateTime            Location Temperature
  <dttm>              <chr>          <dbl>
1 2022-01-30 18:00:00 B                123
2 2022-01-30 18:01:00 B                124
3 2022-01-30 18:02:00 B                123

$C
# A tibble: 3 x 3
  DateTime            Location Temperature
  <dttm>              <chr>          <dbl>
1 2022-01-30 18:00:20 C                112
2 2022-01-30 18:01:20 C                114
3 2022-01-30 18:02:20 C                115

But I cannot get it any further. Can someone offer me some suggestions? A working copy of the data is below.

library(tidyverse)
library(lubridate)
library(purrr)


data <- tibble(
  DateTime = ymd_hms("2022-01-30 18:00:00",
                     "2022-01-30 18:00:00",
                     "2022-01-30 18:00:20",
                     "2022-01-30 18:01:00",
                     "2022-01-30 18:01:00",
                     "2022-01-30 18:01:20",
                     "2022-01-30 18:02:00",
                     "2022-01-30 18:02:00",
                     "2022-01-30 18:02:20"),
  Location = rep(c("A","B","C"),3),
  Temperature = c(122,123,112,123,124,114,122.5,123,115)
)

Thank you kindly!

Shawn Way


Solution

  • This may be done with min/max and group by/summarise

    library(dplyr)
    data %>%
       group_by(Location) %>%
       summarise(Min = min(DateTime), Max = max(DateTime))
    

    splitting to a list and then looping is not really needed. In case, if it is just to understand the usage with map - loop over the split list with map, apply summarise to return the min/max as columns and bind the output list element rbinded with _dfr

    library(purrr)
    map_dfr(data_grouped, ~ .x %>% 
      summarise(Location = first(Location), 
        Min = min(DateTime), Max = max(DateTime)))