Search code examples
mysqlrms-accessnstimeinterval

Join and plot data with different times in 10 minute interval


I have 3 tables in an Access database with the same column names (TempDate and Temp), but different time stamps. The data was collected in 10 minute intervals, but each of the recording devices had different start times. I want to merge these into one table with a single TempDate and one Temp column for each of the tables (temp1, temp2, temp3).

I need help on how to do this in either Access or R. I've started using R with MySQL code but I'm still very new at it. Thanks in advance. Ultimately I want to join this data to another dataframe with a datetime stamp from the same period of dates. I think I can manage that if someone can show me how to tell it to group by an interval. Then finally plot using ggplot

Data

    temp1<-data.frame(TempDate=c("2020/08/11 07:13:01","2020/08/11 07:23:01","2020/08/11 07:33:01","2020/08/11 07:43:01"),Temperature=c(1.610,-1.905,-1.905,-0.901))

temp2<-data.frame(TempDate=c("2020/08/11 07:10:01","2020/08/11 07:20:01","2020/08/11 07:30:01","2020/08/11 07:40:01"),Temperature=c(15.641,15.641,15.641,15.641))

temp3<-data.frame(TempDate=c("2020/08/11 07:19:01","2020/08/11 07:29:01","2020/08/11 07:39:01","2020/08/11 07:49:01"),Temperature=c(2.062,3.573,4.076,4.579))

> temp3 #as example
             TempDate Temperature
1 2020/08/11 07:19:01       2.062
2 2020/08/11 07:29:01       3.573
3 2020/08/11 07:39:01       4.076
4 2020/08/11 07:49:01       4.579

#what I want row 1 is temps recorded from 07:10:00-07:29:59, etc
> 
             TempDate    Temp1    Temp2   Temp3
1 2020/08/11 07:10:00    1.610    15.641   2.062
2 2020/08/11 07:20:00    -1.905   15.641   3.573
3 2020/08/11 07:30:00    -1.905   15.641   4.076
4 2020/08/11 07:40:00    -1.901   15.641   4.579

UPDATE: Thanks to Ben for the great answer to get me started solving this problem. In asking another question, floor_date was suggested. This code worked better for my data than the cut function by @Ben. When using cut I would get times ending in 9 (12:19) instead of 0 (12:10). I also tried TempDate+60 within the cut function, but then some dates would get a time in the next 10 minute interval. The below code was more accurate.

library(lubridate)    
tempdata<-bind_rows(burrow=burrow,shade=shade,sun=sun,.id='Series') %>%
       mutate(TempDate = as.POSIXct(TempDate, tz="UTC"),
         TimeStamp = floor_date(TempDate, unit='10 mins'),
         TimeStamp = as.POSIXct(TimeStamp, tz="UTC")) %>%
       filter(TimeStamp > as.POSIXct("2020-08-12 13:29:00", tz="UTC")) %>%
       select(Series, Temperature,TimeStamp) %>%
       arrange(TimeStamp)

Solution

  • In R you could do the following, using tidyverse approach.

    First, you can use bind_rows to put all your data frames together, and add a source column with the name of data frame those temperatures came from, or destination column in final result.

    Then, make sure your TempDate is POSIXct. You can use cut to put your datetimes into 10 minute intervals.

    At this point, I would consider leaving the result as is for plotting with ggplot2. It's often preferable to leave in "long" format instead of "wide". However, if you want it in "wide" format, then you can use pivot_wider from tidyr.

    library(dplyr)
    library(tidyr)
    
    bind_rows(temp1 = temp1, temp2 = temp2, temp3 = temp3, .id = 'source') %>%
      mutate(TempDate = as.POSIXct(TempDate),
             NewTempDate = cut(TempDate, breaks = "10 min")) %>%
      pivot_wider(id_cols = NewTempDate, names_from = source, values_from = Temperature)
    

    Output

      NewTempDate          temp1 temp2 temp3
      <fct>                <dbl> <dbl> <dbl>
    1 2020-08-11 07:10:00  1.61   15.6  2.06
    2 2020-08-11 07:20:00 -1.90   15.6  3.57
    3 2020-08-11 07:30:00 -1.90   15.6  4.08
    4 2020-08-11 07:40:00 -0.901  15.6  4.58