Search code examples
rdplyrtidyversemagrittr

Slice based on multiple date ranges and multiple columns to formating a new dataframe with R


Let's say I have a sample data as follow:

df <- structure(list(date = structure(c(18912, 18913, 18914, 18915, 
18916, 18917, 18918, 18919, 18920, 18921, 18922, 18923), class = "Date"), 
    value1 = c(1.015, NA, NA, 1.015, 1.015, 1.015, 1.015, 1.015, 
    1.015, 1.015, 1.015, 1.015), value2 = c(1.115668, 1.104622, 
    1.093685, 1.082857, 1.072135, 1.06152, 1.05101, NA, NA, 1.0201, 
    1.01, 1), value3 = c(1.015, 1.030225, NA, NA, 1.077284, 1.093443, 
    1.109845, 1.126493, 1.14339, 1.160541, 1.177949, 1.195618
    )), row.names = c(NA, -12L), class = "data.frame")

and three date intervals:

date_range1 <- (date>='2021-10-12' & date<='2021-10-15')
date_range2 <- (date>='2021-10-16' & date<='2021-10-18')
date_range3 <- (date>='2021-10-21' & date<='2021-10-23')

I need to slice data from value1, value2 and value3 using date_range1, date_range2 and date_range3 respectively, and finally concatenate them to one column as follows:

enter image description here

Please note type 1, 2 and 3 are numbers to indicate date ranges: date_range1, date_range2 and date_range3.

How could I achieve that with R's packages? Thanks.

EDIT:

str(real_data)

Out:

tibble [1,537 x 5] (S3: tbl_df/tbl/data.frame)
 $ date                                                : chr [1:1537] "2008-01-31" "2008-02-29" "2008-03-31" "2008-04-30" ...
 $ value1: num [1:1537] 11.3 11.4 11.4 11.3 11.2 ...
 $ value2    : num [1:1537] 11.4 11.4 11.3 11.3 11.1 ...
 $ value3: num [1:1537] NA NA NA NA NA NA NA NA NA NA ...
 $ value4  : chr [1:1537] "11.60" "10.20" "12.55" "10.37" ...

Solution

  • You may use use dplyr::case_when

    library(dplyr)
    df %>%
      mutate(type = case_when(
        date>='2021-10-12' & date<='2021-10-15' ~ 1,
        date>='2021-10-16' & date<='2021-10-18' ~ 2,
        date>='2021-10-21' & date<='2021-10-23' ~ 3,
        TRUE ~ NA_real_
      ),
      value = case_when(
        type == 1 ~ value1,
        type == 2 ~ value2,
        type == 3 ~ value3,
        TRUE ~ NA_real_
      ))  %>%
      select(date, value, type) %>%
      filter(!is.na(type))
    
             date    value type
    1  2021-10-12 1.015000    1
    2  2021-10-13       NA    1
    3  2021-10-14       NA    1
    4  2021-10-15 1.015000    1
    5  2021-10-16 1.072135    2
    6  2021-10-17 1.061520    2
    7  2021-10-18 1.051010    2
    8  2021-10-21 1.160541    3
    9  2021-10-22 1.177949    3
    10 2021-10-23 1.195618    3