Search code examples
rloopsindexingcolumnsorting

How can I create new columns, based on the numeric value of another column


I have a df (chpt4) with 1000+ participants, and the dates when tests where taken. I would like to accomodate the dates according to how many months have passed between the follow up (t1:t4) and the baseline (t0). For this purpose I created 4 additional columns (difft0t2:difft0t4) that show exaclty the months elapsed between the tests. The image is what I have now.

this is what i have

I am grouping the months in 5 different categories: (also I thought this vectors would help me as a counter)

FU6 <- 1:9
FU12 <- 10:18 
FU24 <- 19:30
FU36 <- 31:42
FU48 <- 43:54

My original idea was to start indexing the values of the difft0t1 column, that belong to the above ranges using which()

which(chpt4$difft0t1 %in% c(FU6)) #this works

which(chpt4$difft0t1 %in% c(FU14)) #this doesn't work at all 

...and use that outcome number, as an index of which element to paste into another column. Its just not working.

keeping with the image example from lines 243 and 244, I would like to outcome columns to look like this:

baseline FU6 FU12 FU24 FU36 FU48
2012-02-24 NA 2013-09-06 2014-02-21 2015-06-23 NA
2012-05-24 NA 2013-05-16 NA 2015-04-20 2016-05-12

Solution

  • I think you need this

    library (tidyverse)
    df %>% pivot_longer(cols = -id, names_to = "Test", values_to = "Dates") %>%
      group_by(id) %>% mutate(new_col = as.numeric(round((Dates - first(Dates))/30,0))) %>%
      mutate(new_col = case_when(new_col == 0 ~ "Baseline",
                                  new_col %in% 1:9 ~ "FU6",
                                  new_col %in% 10:18 ~ "FU12",
                                  new_col %in% 19:30 ~ "FU24",
                                  new_col %in% 31:42 ~ "FU36",
                                  new_col %in% 43:54 ~ "FU48")) %>% filter(!is.na(new_col)) %>%
      select(-Test) %>% pivot_wider(id_cols = "id", names_from = "new_col", values_from = "Dates", values_fn = min)
      
    # A tibble: 4 x 6
    # Groups:   id [4]
      id     Baseline   FU12       FU24       FU36       FU48      
      <chr>  <date>     <date>     <date>     <date>     <date>    
    1 waa000 2012-10-04 2013-09-05 NA         NA         NA        
    2 waf84  2012-02-24 NA         2013-09-06 2015-06-23 NA        
    3 waq593 2012-05-24 2013-05-16 NA         2015-04-20 2016-05-12
    4 wcu776 2013-01-24 2014-01-23 NA         NA         NA       
    

    NOTE whenever there will be two dates in one group, minimum/first of those will be displayed. FU6 category will automatically in picture once the appropriate data is used. sample data used

    dput(df)
    
    > dput(df)
    structure(list(id = c("waa000", "waf84", "waq593", "wcu776"), 
        t0 = structure(c(15617, 15394, 15484, 15729), class = "Date"), 
        t1 = structure(c(15953, 15954, 15841, 16093), class = "Date"), 
        t2 = structure(c(NA, 16122, 16545, NA), class = "Date"), 
        t3 = structure(c(NA, 16609, 16933, NA), class = "Date"), 
        t4 = structure(c(NA_real_, NA_real_, NA_real_, NA_real_), class = "Date")), row.names = c(NA, 
    -4L), class = "data.frame")
    
    > df
          id         t0         t1         t2         t3   t4
    1 waa000 2012-10-04 2013-09-05       <NA>       <NA> <NA>
    2  waf84 2012-02-24 2013-09-06 2014-02-21 2015-06-23 <NA>
    3 waq593 2012-05-24 2013-05-16 2015-04-20 2016-05-12 <NA>
    4 wcu776 2013-01-24 2014-01-23       <NA>       <NA> <NA>