Search code examples
rdataframedata-manipulationdata-filtering

Subsetting dataframe by unique ID and closest date in R


Good afternoon!

Currently I'm working with a large database that contains various information on how loans are paid by borrowers

The example is as follows:

Loan_ID <- c(1, 1, 1, 2, 2, 2)
Execution_Sum <- c(1000, 900, 800, 500, 600, 400)
Execution_Date <- c("2022-03-10", "2022-03-15", "2022-03-17", "2022-03-11", "2022-03-22", "2022-03-29")
df <- data.frame(Loan_ID, Execution_Sum, Execution_Date)
df$Execution_Date <- as.Date(df$Execution_Date)

I've got 3 columns: unique loan ID, Execution_Sum (what was paid by the borrower) and Execution_Date (when this payment actually happend).

The problem is that I want to create a subset of my data frame to get only 1 row for each unique loan (for each unique ID) in such a way that this row does not reflect all payments made by the borrower, but only 1 closest to a predetermined date (March 31, 2022 in my case)

So, the desired result should be:

1, 800, "2022-03-17" and 2, 400, "2022-03-29"

Because these two dates are the closest to March 31, 2022

Is it possible to create this kind of subset?

Thanks a lot for any help provided!


Solution

  • We may slice on the row (which.min) which gives the absolute minimum difference between the 'Execution_Date' and the predetermined date after grouping by 'Loan_ID'

    library(dplyr)
    df %>%
       group_by(Loan_ID) %>% 
       slice(which.min(abs(Execution_Date - as.Date("2022-03-31")))) %>%
       ungroup
    

    -output

    # A tibble: 2 × 3
      Loan_ID Execution_Sum Execution_Date
        <dbl>         <dbl> <date>        
    1       1           800 2022-03-17    
    2       2           400 2022-03-29