I have a data frame for every second. From the data frame with 1 second interval, I managed to aggregate the data into 1-minute interval using the following code:
agg_cont <- df %>% group_by(Date, Hour, Minute, Status, Mean) %>% count(name = 'Occurrence')
Now I have a data frame as shown below,
Date | Hour | Minute | Status | Mean | Occurrence |
---|---|---|---|---|---|
12/01/2022 | 00 | 00 | a | 20 | 60 |
12/02/2022 | 00 | 01 | b | 32 | 60 |
12/01/2022 | 00 | 02 | a | 21 | 60 |
12/02/2022 | 00 | 03 | a | 12 | 60 |
12/01/2022 | 00 | 04 | a | 23 | 20 |
12/01/2022 | 00 | 04 | b | 43 | 40 |
12/01/2022 | 00 | 05 | a | 33 | 60 |
Please note that the column 'Occurrence' denotes the number of seconds the status occurred in the specific minute. For the minute '04', if the occurrence for status 'a' and 'b' is 20 and 40, respectively, then status 'a' occurred for 20 sec in the particular minute.
With the above dataframe, I would like to have only one row for each minute and create new columns for each 'status' and the mean value that occurred in that particular minute.
Desired output:
Date | Hour | Minute | a | b | Mean 'a' | Mean'b' |
---|---|---|---|---|---|---|
12/01/2022 | 00 | 00 | 60 | 0 | 20 | NA |
12/02/2022 | 00 | 01 | 0 | 60 | 32 | NA |
12/01/2022 | 00 | 02 | 60 | 0 | 21 | NA |
12/01/2022 | 00 | 03 | 0 | 60 | 12 | NA |
12/01/2022 | 00 | 04 | 20 | 40 | 23 | 43 |
12/02/2022 | 00 | 05 | 60 | 0 | 33 | NA |
I am trying to use the dcast function to get the desired output.
Thanks
Here is a tidyverse
way.
agg_count <- read.table(text = "
Date Hour Minute Status Mean Occurrence
12/01/2022 00 00 a 20 60
12/02/2022 00 01 b 32 60
12/01/2022 00 02 a 21 60
12/02/2022 00 03 a 12 60
12/01/2022 00 04 a 23 20
12/02/2022 00 04 b 43 40
12/02/2022 00 05 a 33 60
", header = TRUE)
suppressPackageStartupMessages({
library(dplyr)
library(tidyr)
})
agg_count %>%
pivot_wider(
id_cols = c(Date, Hour, Minute),
names_from = Status,
values_from = c(Occurrence, Mean),
values_fill = 0L
)
#> # A tibble: 7 × 7
#> Date Hour Minute Occurrence_a Occurrence_b Mean_a Mean_b
#> <chr> <int> <int> <int> <int> <int> <int>
#> 1 12/01/2022 0 0 60 0 20 0
#> 2 12/02/2022 0 1 0 60 0 32
#> 3 12/01/2022 0 2 60 0 21 0
#> 4 12/02/2022 0 3 60 0 12 0
#> 5 12/01/2022 0 4 20 0 23 0
#> 6 12/02/2022 0 4 0 40 0 43
#> 7 12/02/2022 0 5 60 0 33 0
Created on 2023-03-27 with reprex v2.0.2
As I asked in comment to the question, if the date is not to be considered, then the code below will put input rows with equal minutes in the same output rows.
agg_count %>%
pivot_wider(
id_cols = c(Hour, Minute),
names_from = Status,
values_from = c(Occurrence, Mean),
values_fill = 0L
)
#> # A tibble: 6 × 6
#> Hour Minute Occurrence_a Occurrence_b Mean_a Mean_b
#> <int> <int> <int> <int> <int> <int>
#> 1 0 0 60 0 20 0
#> 2 0 1 0 60 0 32
#> 3 0 2 60 0 21 0
#> 4 0 3 60 0 12 0
#> 5 0 4 20 40 23 43
#> 6 0 5 60 0 33 0
Created on 2023-03-27 with reprex v2.0.2