Search code examples
rdataframefiltersubsetunique

Find the newest entry per client in R


I am trying to filter a dataframe in which I have three columns:

  1. date (format: "day/month/year")
  2. client name
  3. client spending on an specific product

I want to filter this df so I could get only the newest data purchase by client

Is there any way I could do this?


Solution

  • Let me first create a dummy data frame

    library(dplyr)
    
    names <- c("A", "B", "C", "D")
    
    client <- sample(names, size=20, replace=T)
    dates <- sample(seq(as.Date('1999/01/01'), as.Date('2000/01/01'), by="day"), 20)
    amount <- sample(c(0:1000), size=20)
    df <- data.frame(dates, client, amount)
    

    So the data frame looks like this

            dates client amount
    1  1999-08-21      A    632
    2  1999-08-06      B    449
    3  1999-03-20      B    402
    4  1999-05-15      B    557
    5  1999-04-29      D    960
    6  1999-03-07      A    977
    7  1999-12-02      D    106
    8  1999-12-08      D    891
    9  1999-12-06      B    375
    10 1999-03-28      C    509
    11 1999-07-27      C    722
    12 1999-02-01      D    923
    13 1999-02-20      B    517
    14 1999-12-17      B    487
    15 1999-11-27      C    486
    16 1999-05-26      B    873
    17 1999-01-11      A    493
    18 1999-08-16      A    620
    19 1999-03-17      B    899
    20 1999-03-01      C    297
    

    You can then get filter the data

    result <- df %>% 
                 group_by(client) %>% 
                 filter(dates == max(dates))
    result
    

    which will give you the following result.

      dates      client amount
      <date>     <fct>   <int>
    1 1999-08-21 A         632
    2 1999-12-08 D         891
    3 1999-12-17 B         487
    4 1999-11-27 C         486