Search code examples
rtimestampcorrelationtemperaturemeasurement

Dealing with two time series with unequal intervals


I am currently dealing with temperature measurements that have different time steps (picture below):

temperatures vs time

I want to compare the two temperatures and check if they correlate with each other (Pearson correlation). But for that, I need the same amount of values in each column (temp1 and temp2). To make a correlation test, I want to create a new data frame which includes only the two temperature values that were generated at the same time (matching values visible in picture below).

temperatures vs time, 2

As an output, I want to generate a data frame that has only the matching columns "time", "temp1" and "temp2" in it. In this case that would make only three datasets instead of the total eight rows.

wished output file

Do you have an idea how to achieve that?

I am fairly new to R and already searched about a solution, without success so far. Thanks in advance for your suggestions.


Solution

  • The image OCR'd well but please read up on how to use dput() to make it easier for folks to help you.

    ALSO: Image 1 in your question differs from Image 2 by more than color. You modified times between images which has nothing to do with R or R knowledgfe and is really unhelpful/confusing. Hence the reiterated suggestion to just use the output of dput into a code block.

    0 external dependency base R solution:

    read.csv(text="temp1,time1,temp2,time2
    21.875,01.11.18 01:54,22.500,01.11.18 01:40
    21.875,01.11.18 01:57,22.563,01.11.18 01:41
    21.813,01.11.18 01:58,22.563,01.11.18 01:51
    21.875,01.11.18 01:59,22.625,01.11.18 01:52
    21.875,01.11.18 02:03,22.563,01.11.18 01:53
    21.813,01.11.18 02:04,22.625,01.11.18 01:54
    21.875,01.11.18 02:05,22.625,01.11.18 02:05
    21.813,01.11.18 02:06,22.688,01.11.18 02:06",
             stringsAsFactors=FALSE) -> xdf
    
    xdf$time1 <- as.POSIXct(xdf$time1, format="%m.%d.%y %H:%M")
    xdf$time2 <- as.POSIXct(xdf$time2, format="%m.%d.%y %H:%M")
    
    setNames(
      merge(xdf[,1:2], xdf[,3:4], by.x="time1", by.y="time2"),
      c("time", "temp1", "temp2")
    )
    ##                  time  temp1  temp2
    ## 1 2018-01-11 01:54:00 21.875 22.625
    ## 2 2018-01-11 02:05:00 21.875 22.625
    ## 3 2018-01-11 02:06:00 21.813 22.688
    

    57 compiled 📦 dependency tidyverse solution:

      read.csv(text="temp1,time1,temp2,time2
      21.875,01.11.18 01:54,22.500,01.11.18 01:40
      21.875,01.11.18 01:57,22.563,01.11.18 01:41
      21.813,01.11.18 01:58,22.563,01.11.18 01:51
      21.875,01.11.18 01:59,22.625,01.11.18 01:52
      21.875,01.11.18 02:03,22.563,01.11.18 01:53
      21.813,01.11.18 02:04,22.625,01.11.18 01:54
      21.875,01.11.18 02:05,22.625,01.11.18 02:05
      21.813,01.11.18 02:06,22.688,01.11.18 02:06",
               stringsAsFactors=FALSE) -> xdf
    
      library(tidyverse)
    
      mutate(xdf, time1 = lubridate::mdy_hm(time1)) %>%
        mutate(time2 = lubridate::mdy_hm(time2)) -> xdf
    
      left_join(
        select(xdf, temp1, time1),
        select(xdf, temp2, time2),
        by = c("time1" = "time2")
      ) %>%
        filter(!is.na(temp2)) %>%
        select(time = time1, temp1, temp2)
      ##                  time  temp1  temp2
      ## 1 2018-01-11 01:54:00 21.875 22.625
      ## 2 2018-01-11 02:05:00 21.875 22.625
      ## 3 2018-01-11 02:06:00 21.813 22.688