I am currently dealing with temperature measurements that have different time steps (picture below):
I want to compare the two temperatures and check if they correlate with each other (Pearson correlation). But for that, I need the same amount of values in each column (temp1 and temp2). To make a correlation test, I want to create a new data frame which includes only the two temperature values that were generated at the same time (matching values visible in picture below).
As an output, I want to generate a data frame that has only the matching columns "time", "temp1" and "temp2" in it. In this case that would make only three datasets instead of the total eight rows.
Do you have an idea how to achieve that?
I am fairly new to R and already searched about a solution, without success so far. Thanks in advance for your suggestions.
The image OCR'd well but please read up on how to use dput()
to make it easier for folks to help you.
ALSO: Image 1 in your question differs from Image 2 by more than color. You modified times between images which has nothing to do with R or R knowledgfe and is really unhelpful/confusing. Hence the reiterated suggestion to just use the output of dput
into a code block.
read.csv(text="temp1,time1,temp2,time2
21.875,01.11.18 01:54,22.500,01.11.18 01:40
21.875,01.11.18 01:57,22.563,01.11.18 01:41
21.813,01.11.18 01:58,22.563,01.11.18 01:51
21.875,01.11.18 01:59,22.625,01.11.18 01:52
21.875,01.11.18 02:03,22.563,01.11.18 01:53
21.813,01.11.18 02:04,22.625,01.11.18 01:54
21.875,01.11.18 02:05,22.625,01.11.18 02:05
21.813,01.11.18 02:06,22.688,01.11.18 02:06",
stringsAsFactors=FALSE) -> xdf
xdf$time1 <- as.POSIXct(xdf$time1, format="%m.%d.%y %H:%M")
xdf$time2 <- as.POSIXct(xdf$time2, format="%m.%d.%y %H:%M")
setNames(
merge(xdf[,1:2], xdf[,3:4], by.x="time1", by.y="time2"),
c("time", "temp1", "temp2")
)
## time temp1 temp2
## 1 2018-01-11 01:54:00 21.875 22.625
## 2 2018-01-11 02:05:00 21.875 22.625
## 3 2018-01-11 02:06:00 21.813 22.688
tidyverse
solution: read.csv(text="temp1,time1,temp2,time2
21.875,01.11.18 01:54,22.500,01.11.18 01:40
21.875,01.11.18 01:57,22.563,01.11.18 01:41
21.813,01.11.18 01:58,22.563,01.11.18 01:51
21.875,01.11.18 01:59,22.625,01.11.18 01:52
21.875,01.11.18 02:03,22.563,01.11.18 01:53
21.813,01.11.18 02:04,22.625,01.11.18 01:54
21.875,01.11.18 02:05,22.625,01.11.18 02:05
21.813,01.11.18 02:06,22.688,01.11.18 02:06",
stringsAsFactors=FALSE) -> xdf
library(tidyverse)
mutate(xdf, time1 = lubridate::mdy_hm(time1)) %>%
mutate(time2 = lubridate::mdy_hm(time2)) -> xdf
left_join(
select(xdf, temp1, time1),
select(xdf, temp2, time2),
by = c("time1" = "time2")
) %>%
filter(!is.na(temp2)) %>%
select(time = time1, temp1, temp2)
## time temp1 temp2
## 1 2018-01-11 01:54:00 21.875 22.625
## 2 2018-01-11 02:05:00 21.875 22.625
## 3 2018-01-11 02:06:00 21.813 22.688