Search code examples
rdataframedplyrformattable

Aggregating cells in dataframe into a sequence


I'm trying to turn this dataframe in R:

tableDataSortedCols <- data.frame(
    "groups" = c(1,1,1,1,1,1,
                 1,1,1,1,1,1,
                 2,2,2,2,2,2,
                 2,2,2,2,2,2,
                 2,2,2,2,2,2,
                 3,3,3,3,3,3), 
    "element" = c("A","A","A","A","A","A",
                  "B","B","B","B","B","B",
                  "C","C","C","C","C","C",
                  "D","D","D","D","D","D",
                  "E","E","E","E","E","E",
                  "F","F","F","F","F","F"), 
    "hours" = c(0,4,8,12,16,20,
                0,4,8,12,16,20,
                0,4,8,12,16,20,
                0,4,8,12,16,20,
                0,4,8,12,16,20,
                0,4,8,12,16,20),
    "values" = c(123.0,124.0,123.5,125.0,123.0,123.0,
                 223.0,224.0,223.5,225.0,223.0,223.0,
                 223.1,223.1,223.1,223.5,223.1,223.2,
                 233.1,234.0,233.5,235.0,233.0,233.0,
                 323.0,324.0,323.5,325.0,323.0,323.0,
                 523.0,524.0,523.5,525.0,523.0,523.0)
)

into this:

  groups element   timeseries
1      1       A   123.0 124.0 123.5 125.0 123.0 123.0
2              B   223.0 224.0 223.5 225.0 223.0 223.0
3      2       C   223.1 223.1 223.1 223.5 223.1 223.2
4              D   233.1 234.0 233.5 235.0 233.0 233.0
5              E   323.0 324.0 323.5 325.0 323.0 323.0
6      3       F   523.0 524.0 523.5 525.0 523.0 523.0

collapsing/aggregating the hours and values pairs into the sequence called timeseries where each element in the timeseries sequence corresponds to one hour value 0h, 4h, 8h, 12h, 16h, 20h.

This is what I did so far:

collapse_rows_df <- function(df, variable){
  group_var <- enquo(variable)

  df %>%
    group_by(!! group_var) %>%
    mutate(groupRow = 1:n()) %>%
    ungroup() %>%
    mutate(!!quo_name(group_var) := ifelse(groupRow == 1, as.character(!! group_var), "")) %>%
    select(-c(groupRow))
}     

tableOut <- tableDataSortedCols %>%
  group_by(groups) %>%
  select(groups, everything()) %>%
  distinct %>%
  collapse_rows_df(groups) %>%
  formattable()

Could you suggest a way to achieve that?


Solution

  • library(tidyverse)
    make_timeseries = function(hours, values) {
      paste(values[order(hours)], collapse = " ")
    }
    tableDataSortedCols %>%
      group_by(groups, element) %>%
      summarise(timeseries = make_timeseries(hours, values))