Search code examples
rauto-populateauto-populating

R: autopopulation of data and date complements in a date sequence?


I am trying to autopopulate data. I need a command to fill the gaps between dates: Intended Output shows the gaps with similar precedence fills. The filling requires to fill the missing date entries.

In R, how to autopopulate the data by filling the gaps between dates in some date sequence?


Minimal working examples

Input for the dates sequence (2016-12-25, 2017-01-05)

> aa<- data.frame(a=c(1,11,111),b=c(2,22,222),length=c(3,5,1),date=c(as.Date("28.12.2016",format="%d.%m.%Y"), as.Date("30.12.2016",format="%d.%m.%Y"), as.Date("01.01.2017",format="%d.%m.%Y")))
> 
> dateSeq<-seq.Date(as.Date("2016/12/25"), as.Date("2017/01/05"), "day")
> dateSeq
 [1] "2016-12-25" "2016-12-26" "2016-12-27" "2016-12-28" "2016-12-29"
 [6] "2016-12-30" "2016-12-31" "2017-01-01" "2017-01-02" "2017-01-03"
[11] "2017-01-04" "2017-01-05"

>
> aa
    a   b length       date
1   1   2      3 2016-12-28
2  11  22      5 2016-12-30
3 111 222      1 2017-01-01

that shows the data recorded. The NA-fills, shown to some extent by Joel Wilson, is done such that

dateSeq<-seq.Date(as.Date("2016/12/25"), as.Date("2017/01/05"), "day")
df<-data.frame(dateSeq)
df$date = as.Date(df$date, format = "%Y-%m-%d")
merge(df, aa, by = "date", all.x= TRUE)
     a   b       length  date
0.1  NA  NA      NA      2016-12-25
0.2  NA  NA      NA      2016-12-26
0.3  NA  NA      NA      2016-12-27
1    1   2       3       2016-12-28
0.4  NA  NA      NA      2016-12-29
2    11  22      5       2016-12-30
0.5  NA  NA      NA      2016-12-31
3    111 222     1       2017-01-01
0.6  NA  NA      NA      2017-01-02
0.7  NA  NA      NA      2017-01-03
0.8  NA  NA      NA      2017-01-04
0.9  NA  NA      NA      2017-01-05
0.10 NA  NA      NA      2017-01-06

where our goal is to fill the NA entries with something called similar precedence approach that depends on its neighbours.

Intended Output with the input date-complemented and similar precedence fills

     a   b       length  date
0.1  0.9 2       3       2016-12-25
0.2  1   2.1     2       2016-12-26
0.3  0.8 2.2     3       2016-12-27
1    1   2       3       2016-12-28
0.4  10  20      4       2016-12-29
2    11  22      5       2016-12-30
0.5  80  150     3       2016-12-31
3    111 222     1       2017-01-01
0.6  100 130     5       2017-01-02
0.7  50  200     3       2017-01-03
0.8  20  100     2       2017-01-04
0.9  14  40      5       2017-01-05
0.10 80  140     4       2017-01-06

Solution

  • My point was:

    dateSeq<-seq.Date(as.Date("2016/12/25"), as.Date("2017/01/05"), "day")
    df<-data.frame(dateSeq)
    df
    #          date
    # 1: 2016-12-25
    # 2: 2016-12-26
    # 3: 2016-12-27
    # 4: 2016-12-28
    # 5: 2016-12-29
    # 6: 2016-12-30
    # 7: 2016-12-31
    # 8: 2017-01-01
    # 9: 2017-01-02
    #10: 2017-01-03
    #11: 2017-01-04
    #12: 2017-01-05
    #13: 2017-01-06
    
    df$date = as.Date(df$date, format = "%Y-%m-%d")
    merge(df, aa, by = "date", all.x= TRUE)
    #          date   a   b length
    # 1: 2016-12-25  NA  NA     NA
    # 2: 2016-12-26  NA  NA     NA
    # 3: 2016-12-27  NA  NA     NA
    # 4: 2016-12-28   1   2      3
    # 5: 2016-12-29  NA  NA     NA
    # 6: 2016-12-30  11  22      5
    # 7: 2016-12-31  NA  NA     NA
    # 8: 2017-01-01 111 222      1
    # 9: 2017-01-02  NA  NA     NA
    #10: 2017-01-03  NA  NA     NA
    #11: 2017-01-04  NA  NA     NA
    #12: 2017-01-05  NA  NA     NA
    #13: 2017-01-06  NA  NA     NA