r timestamp correlation temperature measurement

Dealing with two time series with unequal intervals

I am currently dealing with temperature measurements that have different time steps (picture below):

temperatures vs time

I want to compare the two temperatures and check if they correlate with each other (Pearson correlation). But for that, I need the same amount of values in each column (temp1 and temp2). To make a correlation test, I want to create a new data frame which includes only the two temperature values that were generated at the same time (matching values visible in picture below).

temperatures vs time, 2

As an output, I want to generate a data frame that has only the matching columns "time", "temp1" and "temp2" in it. In this case that would make only three datasets instead of the total eight rows.

wished output file

Do you have an idea how to achieve that?

I am fairly new to R and already searched about a solution, without success so far. Thanks in advance for your suggestions.

Solution

The image OCR'd well but please read up on how to use dput() to make it easier for folks to help you.

ALSO: Image 1 in your question differs from Image 2 by more than color. You modified times between images which has nothing to do with R or R knowledgfe and is really unhelpful/confusing. Hence the reiterated suggestion to just use the output of dput into a code block.

0 external dependency base R solution:

read.csv(text="temp1,time1,temp2,time2
21.875,01.11.18 01:54,22.500,01.11.18 01:40
21.875,01.11.18 01:57,22.563,01.11.18 01:41
21.813,01.11.18 01:58,22.563,01.11.18 01:51
21.875,01.11.18 01:59,22.625,01.11.18 01:52
21.875,01.11.18 02:03,22.563,01.11.18 01:53
21.813,01.11.18 02:04,22.625,01.11.18 01:54
21.875,01.11.18 02:05,22.625,01.11.18 02:05
21.813,01.11.18 02:06,22.688,01.11.18 02:06",
         stringsAsFactors=FALSE) -> xdf

xdf$time1 <- as.POSIXct(xdf$time1, format="%m.%d.%y %H:%M")
xdf$time2 <- as.POSIXct(xdf$time2, format="%m.%d.%y %H:%M")

setNames(
  merge(xdf[,1:2], xdf[,3:4], by.x="time1", by.y="time2"),
  c("time", "temp1", "temp2")
)
##                  time  temp1  temp2
## 1 2018-01-11 01:54:00 21.875 22.625
## 2 2018-01-11 02:05:00 21.875 22.625
## 3 2018-01-11 02:06:00 21.813 22.688

57 compiled 📦 dependency `tidyverse` solution:

  read.csv(text="temp1,time1,temp2,time2
  21.875,01.11.18 01:54,22.500,01.11.18 01:40
  21.875,01.11.18 01:57,22.563,01.11.18 01:41
  21.813,01.11.18 01:58,22.563,01.11.18 01:51
  21.875,01.11.18 01:59,22.625,01.11.18 01:52
  21.875,01.11.18 02:03,22.563,01.11.18 01:53
  21.813,01.11.18 02:04,22.625,01.11.18 01:54
  21.875,01.11.18 02:05,22.625,01.11.18 02:05
  21.813,01.11.18 02:06,22.688,01.11.18 02:06",
           stringsAsFactors=FALSE) -> xdf

  library(tidyverse)

  mutate(xdf, time1 = lubridate::mdy_hm(time1)) %>%
    mutate(time2 = lubridate::mdy_hm(time2)) -> xdf

  left_join(
    select(xdf, temp1, time1),
    select(xdf, temp2, time2),
    by = c("time1" = "time2")
  ) %>%
    filter(!is.na(temp2)) %>%
    select(time = time1, temp1, temp2)
  ##                  time  temp1  temp2
  ## 1 2018-01-11 01:54:00 21.875 22.625
  ## 2 2018-01-11 02:05:00 21.875 22.625
  ## 3 2018-01-11 02:06:00 21.813 22.688

Dealing with two time series with unequal intervals

0 external dependency base R solution:

57 compiled 📦 dependency tidyverse solution:

57 compiled 📦 dependency `tidyverse` solution: