Search code examples
rsumtime-seriesmutate

R: Sum two columns from different time series data frames of unequal length, indexing by timeStamp


I have the following dataframes:

df1 <-
    setNames(data.frame(
        as.POSIXct(
            c(
            "2022-07-29 00:00:00",
            "2022-07-29 00:00:05",
            "2022-07-29 00:05:00",
            "2022-07-29 00:05:05",
            "2022-07-29 00:10:00",
            "2022-07-29 00:15:00",
            "2022-07-29 00:20:00",
            "2022-07-29 00:20:05"
            )),
        c(1, 2, 3, 4, 5, 6, 7, 8)
    ),
    c("timeStamp", "value1"))
df2 <-
    setNames(data.frame(
        as.POSIXct(
            c(
            "2022-07-29 00:00:05",
            "2022-07-29 00:05:00",
            "2022-07-29 00:05:03",
            "2022-07-29 00:10:02",
            "2022-07-29 00:20:05"
            )),
        c(1, 2, 3, 4, 5)
    ),
    c("timeStamp", "value1"))

What I want to do, in a new column in df1, is sum the two "value1" columns together when they are from the same time stamp, and if there is only one value at a certain time stamp, then take that value. so the result should be:

df_new <- df1 %>% mutate(total = df1$value1 + df2$value1)
...

so total would have 10 values. however I always get an error: longer object length is not a multiple of shorter object length


Solution

  • Something like this?

    df1 |> 
        full_join(df2, by = "timeStamp") |>
        group_by(timeStamp) |> 
        mutate(total = sum(value1.x, value1.y, na.rm = TRUE))
    
    
    # Groups:   timeStamp [10]
       timeStamp           value1.x value1.y total
       <dttm>                 <dbl>    <dbl> <dbl>
     1 2022-07-29 00:00:00        1       NA     1
     2 2022-07-29 00:00:05        2        1     3
     3 2022-07-29 00:05:00        3        2     5
     4 2022-07-29 00:05:05        4       NA     4
     5 2022-07-29 00:10:00        5       NA     5
     6 2022-07-29 00:15:00        6       NA     6
     7 2022-07-29 00:20:00        7       NA     7
     8 2022-07-29 00:20:05        8        5    13
     9 2022-07-29 00:05:03       NA        3     3
    10 2022-07-29 00:10:02       NA        4     4
    
    

    Addendum:

    
    df1 |> 
        full_join(df2, by = "timeStamp") |>
        group_by(timeStamp) |> 
        mutate(total = sum(value1.x, value1.y, na.rm = TRUE)) |> 
        ungroup() |> 
        mutate(ctotal = total + lag(total, 1),
               ctotal = ifelse(is.na(ctotal), total, ctotal))
    
    
       timeStamp           value1.x value1.y total ctotal
       <dttm>                 <dbl>    <dbl> <dbl>  <dbl>
     1 2022-07-29 00:00:00        1       NA     1      1
     2 2022-07-29 00:00:05        2        1     3      4
     3 2022-07-29 00:05:00        3        2     5      8
     4 2022-07-29 00:05:05        4       NA     4      9
     5 2022-07-29 00:10:00        5       NA     5      9
     6 2022-07-29 00:15:00        6       NA     6     11
     7 2022-07-29 00:20:00        7       NA     7     13
     8 2022-07-29 00:20:05        8        5    13     20
     9 2022-07-29 00:05:03       NA        3     3     16
    10 2022-07-29 00:10:02       NA        4     4      7