Search code examples
rdplyraggregatesummarize

Aggregate, dcast and create new columns in R


I have a data frame for every second. From the data frame with 1 second interval, I managed to aggregate the data into 1-minute interval using the following code:

agg_cont <- df %>% group_by(Date, Hour, Minute, Status, Mean) %>% count(name = 'Occurrence')

Now I have a data frame as shown below,

Date Hour Minute Status Mean Occurrence
12/01/2022 00 00 a 20 60
12/02/2022 00 01 b 32 60
12/01/2022 00 02 a 21 60
12/02/2022 00 03 a 12 60
12/01/2022 00 04 a 23 20
12/01/2022 00 04 b 43 40
12/01/2022 00 05 a 33 60

Please note that the column 'Occurrence' denotes the number of seconds the status occurred in the specific minute. For the minute '04', if the occurrence for status 'a' and 'b' is 20 and 40, respectively, then status 'a' occurred for 20 sec in the particular minute.

With the above dataframe, I would like to have only one row for each minute and create new columns for each 'status' and the mean value that occurred in that particular minute.

Desired output:

Date Hour Minute a b Mean 'a' Mean'b'
12/01/2022 00 00 60 0 20 NA
12/02/2022 00 01 0 60 32 NA
12/01/2022 00 02 60 0 21 NA
12/01/2022 00 03 0 60 12 NA
12/01/2022 00 04 20 40 23 43
12/02/2022 00 05 60 0 33 NA

I am trying to use the dcast function to get the desired output.

Thanks


Solution

  • Here is a tidyverse way.

    agg_count <- read.table(text = "
    Date    Hour    Minute  Status  Mean    Occurrence
    12/01/2022  00  00  a   20  60
    12/02/2022  00  01  b   32  60
    12/01/2022  00  02  a   21  60
    12/02/2022  00  03  a   12  60
    12/01/2022  00  04  a   23  20
    12/02/2022  00  04  b   43  40
    12/02/2022  00  05  a   33  60
    ", header = TRUE)
    
    suppressPackageStartupMessages({
      library(dplyr)
      library(tidyr)
    })
    
    agg_count %>%
      pivot_wider(
        id_cols = c(Date, Hour, Minute),
        names_from = Status,
        values_from = c(Occurrence, Mean),
        values_fill = 0L
      )
    #> # A tibble: 7 × 7
    #>   Date        Hour Minute Occurrence_a Occurrence_b Mean_a Mean_b
    #>   <chr>      <int>  <int>        <int>        <int>  <int>  <int>
    #> 1 12/01/2022     0      0           60            0     20      0
    #> 2 12/02/2022     0      1            0           60      0     32
    #> 3 12/01/2022     0      2           60            0     21      0
    #> 4 12/02/2022     0      3           60            0     12      0
    #> 5 12/01/2022     0      4           20            0     23      0
    #> 6 12/02/2022     0      4            0           40      0     43
    #> 7 12/02/2022     0      5           60            0     33      0
    

    Created on 2023-03-27 with reprex v2.0.2


    As I asked in comment to the question, if the date is not to be considered, then the code below will put input rows with equal minutes in the same output rows.

    agg_count %>%
      pivot_wider(
        id_cols = c(Hour, Minute),
        names_from = Status,
        values_from = c(Occurrence, Mean),
        values_fill = 0L
      )
    #> # A tibble: 6 × 6
    #>    Hour Minute Occurrence_a Occurrence_b Mean_a Mean_b
    #>   <int>  <int>        <int>        <int>  <int>  <int>
    #> 1     0      0           60            0     20      0
    #> 2     0      1            0           60      0     32
    #> 3     0      2           60            0     21      0
    #> 4     0      3           60            0     12      0
    #> 5     0      4           20           40     23     43
    #> 6     0      5           60            0     33      0
    

    Created on 2023-03-27 with reprex v2.0.2