Search code examples
rdatedplyrtime-seriesmax

How to get the value in the largest date of each name?


I need to use dplyr mutate to create a new column that can show me (perhaps repeated values) of each maximum date what the respective value is.

library(dplyr)

set.seed(1234)
df <- data.frame(
  dates = seq.Date(as.Date("2022-08-23"), 
                   as.Date("2022-09-11"),
                   by = "days"),
  names = c(
    "a", 
    "a",
    "b",
    "c",
    "a",
    "a",
    "c",
    "b",
    "a",
    "a",
    "b",
    "c",
    "a",
    "b",
    "c",
    "d",
    "b",
    "a",
    "b",
    "c"   
    ),
  value = rnorm(20,0,1)
)

glimpse(df)

df

How could I use mutate to insert a column that repeats the largest date value for each name until the name changes?


Solution

  • library(dplyr)
    df %>%
      arrange(names, dates) %>% 
      group_by(names) %>% 
      mutate(last_val = last(value))
    
    # A tibble: 20 × 4
    # Groups:   names [4]
       dates      names   value max_val
       <date>     <chr>   <dbl>   <dbl>
     1 2022-08-23 a     -1.21    -0.911
     2 2022-08-24 a      0.277   -0.911
     3 2022-08-27 a      0.429   -0.911
     4 2022-08-28 a      0.506   -0.911
     5 2022-08-31 a     -0.564   -0.911
     6 2022-09-01 a     -0.890   -0.911
     7 2022-09-04 a     -0.776   -0.911
     8 2022-09-09 a     -0.911   -0.911
     9 2022-08-25 b      1.08    -0.837
    10 2022-08-30 b     -0.547   -0.837
    11 2022-09-02 b     -0.477   -0.837
    12 2022-09-05 b      0.0645  -0.837
    13 2022-09-08 b     -0.511   -0.837
    14 2022-09-10 b     -0.837   -0.837
    15 2022-08-26 c     -2.35     2.42 
    16 2022-08-29 c     -0.575    2.42 
    17 2022-09-03 c     -0.998    2.42 
    18 2022-09-06 c      0.959    2.42 
    19 2022-09-11 c      2.42     2.42 
    20 2022-09-07 d     -0.110   -0.110
    

    Or if you don't care about the ordering

    df %>% 
      group_by(names) %>% 
      mutate(last_val = last(value, order_by = dates))