This is how my data looks like:
dput(head(COR_trial,10))
structure(list(rDate = structure(c(1439995500, 1439995800, 1439996100,
1439996400, 1439996700, 1439997000, 1439997300, 1439997600, 1439997900,
1439998200), class = c("POSIXct", "POSIXt"), tzone = ""), CCRN630 = c(NA,
NA, NA, NA, NA, 0.2878412, NA, NA, NA, NA), CCRN800 = c(NA, NA,
NA, NA, 0.3213675, NA, NA, NA, NA, NA), CCRN532 = c(NA, NA, NA,
0.3327465, NA, NA, NA, NA, NA, NA), CCRN570 = c(NA, NA, NA, NA,
NA, 0.4172932, NA, NA, NA, NA)), row.names = c(NA, 10L), class = "data.frame")
And this is the head of the dataframe:
rDate CCRN630 CCRN800 CCRN532 CCRN570
1 2015-08-19 14:45:00 NA NA NA NA
2 2015-08-19 14:50:00 NA NA NA NA
3 2015-08-19 14:55:00 NA NA NA NA
4 2015-08-19 15:00:00 NA NA 0.3327465 NA
5 2015-08-19 15:05:00 NA 0.3213675 NA NA
6 2015-08-19 15:10:00 0.2878412 NA NA 0.4172932
From 2015-08-19 14:45:00
to 2018-10-11 13:00:00
(data distributed 5 in 5 minutes), I have 14 values for each one of the columns REFN630
, REFN800
, REFN532
and REFN570
(Note: their values occur in separate date and time, however sometimes they match)
I will give an example of one column REFN630
and the dates were the values are distributed.
rDate CCRN630
<dttm> <dbl>
1 2015-08-19 15:10:00 0.288
2 2015-10-23 10:40:00 0.129
3 2016-02-03 12:40:00 0.373
4 2016-03-24 13:25:00 0.392
5 2016-06-21 11:50:00 0.144
6 2016-07-15 11:35:00 0.195
7 2016-08-18 11:35:00 0.204
8 2016-12-20 13:00:00 0.22
9 2017-01-18 13:25:00 0.210
10 2017-02-17 13:05:00 0.237
11 2017-03-29 12:10:00 0.2
12 2017-05-03 10:30:00 0.174
13 2017-06-08 12:20:00 0.157
14 2017-07-11 11:55:00 0.164
As you can see a LOT of empty spaces need to be filled between the dates. There's any way to fill the empty spaces with the same value until the next defined value?
I found a way to interpolate the values with this code:
CCRN630<-fillGap(COR_trial$CCRN630, method=c("fixed"),rule=2)
Now I want to do something more simple than the interpolation but I don't know how to do it.
The desired output would be something like this:
rDate CCRN630 CCRN800 CCRN532 CCRN570
1 2015-08-19 14:45:00 0.2878412 0.3213675 0.3327465 0.4172932
2 2015-08-19 14:50:00 0.2878412 0.3213675 0.3327465 0.4172932
3 2015-08-19 14:55:00 0.2878412 0.3213675 0.3327465 0.4172932
4 2015-08-19 15:00:00 0.2878412 0.3213675 0.3327465 0.4172932
5 2015-08-19 15:05:00 0.2878412 0.3213675 0.3327465 0.4172932
6 2015-08-19 15:10:00 0.2878412 0.3213675 0.3327465 0.4172932
18670 2015-08-19 14:45:00 0.2878412 0.3213675 0.3327465 0.4172932
18671 2015-08-19 14:50:00 0.2878412 0.3213675 0.3327465 0.4172932
18672 2015-10-23 10:40:00 0.1287671 0.1181319 0.2111437 0.2463768
18673 2015-08-19 15:00:00 0.1287671 0.1181319 0.2111437 0.2463768
18674 2015-08-19 15:05:00 0.1287671 0.1181319 0.2111437 0.2463768
18675 2015-08-19 15:10:00 0.1287671 0.1181319 0.2111437 0.2463768
Any help will be much appreciated.
If I understand your question the right way, you could use dplyr
and tidyr
:
library(dplyr)
library(tidyr)
COR_trial %>%
complete(rDate = seq(min(rDate), max(rDate), by=300)) %>%
fill(starts_with("CCRN"))
complete
creates the missing Dates and Timesseq(min(rDate), max(rDate), by=300)
creates a sequence of dates and times starting at the lowest date/time in your dataset. The steps are always 5 minutes, counted in seconds, therefore by = 300
.fill
uses the known values und fills the rows until the next known value. If you want the rows filled upwards, you can change fill(starts_with("CCRN"))
with fill(starts_with("CCRN"), .direction="up")
.This returns
# A tibble: 18,648 x 5
rDate CCRN630 CCRN800 CCRN532 CCRN570
<dttm> <dbl> <dbl> <dbl> <dbl>
1 2015-08-19 16:45:00 NA NA NA NA
2 2015-08-19 16:50:00 NA NA NA NA
3 2015-08-19 16:55:00 NA NA NA NA
4 2015-08-19 17:00:00 NA NA 0.333 NA
5 2015-08-19 17:05:00 NA 0.321 0.333 NA
6 2015-08-19 17:10:00 0.288 0.321 0.333 0.417
7 2015-08-19 17:15:00 0.288 0.321 0.333 0.417
8 2015-08-19 17:20:00 0.288 0.321 0.333 0.417
9 2015-08-19 17:25:00 0.288 0.321 0.333 0.417
10 2015-08-19 17:30:00 0.288 0.321 0.333 0.417
structure(list(rDate = structure(c(1445589600, 1439995500, 1439995800,
1439996100, 1439996400, 1439996700, 1439997000, 1439997300, 1439997600,
1439997900, 1439998200), tzone = "", class = c("POSIXct", "POSIXt"
)), CCRN630 = c(0.129, NA, NA, NA, NA, NA, 0.2878412, NA, NA,
NA, NA), CCRN800 = c(NA, NA, NA, NA, NA, 0.3213675, NA, NA, NA,
NA, NA), CCRN532 = c(NA, NA, NA, NA, 0.3327465, NA, NA, NA, NA,
NA, NA), CCRN570 = c(NA, NA, NA, NA, NA, NA, 0.4172932, NA, NA,
NA, NA)), row.names = c(NA, -11L), class = c("tbl_df", "tbl",
"data.frame"), problems = structure(list(row = 11L, col = NA_character_,
expected = "4 columns", actual = "5 columns", file = "literal data"), row.names = c(NA,
-1L), class = c("tbl_df", "tbl", "data.frame")))