I'm trying to filter a dataframe of loan data, but each monthly report duplicates loans if they're still outstanding, or drops loans if paid (can't just use latest monthly report). I'd like to filter through lenders for unique Date of Maturity for loans and remove duplicates and keeping only the most recent data by report date. Here's an example of the data:
df <- data.frame(Reporting.date=c("6/30/2020","6/30/2020","6/30/2020","8/31/2021","8/31/2021"
,"8/31/2021","6/30/2020","7/31/2021","5/31/2020","12/31/2020")
, Lender.name=c("Lender1","Lender1","Lender1","Lender1","Lender1","Lender1"
,"Lender1","Lender1","Lender2","Lender2")
, Date.of.maturity=c("6/20/2025","6/20/2025","6/20/2025","6/20/2025","6/20/2025"
,"6/20/2025","6/30/2022","6/30/2022","5/15/2024","5/15/2024")
, Loan.amount=c(13129474,14643881,44935677,13129474,14643881,44935677
,150000,150000,2750000,2750000))
As you can see from the example data Lender1 has 2 unique maturity dates. The first date has 3 loans that get duplicated across 2 reporting dates, and the second maturity date has 1 loan being duplicated. I want to remove duplicates to keep the newest report data. My hope is to get a dataframe that looks like this afterwards:
Reporting.date | Lender.name | Date.of.maturity | Loan.amount |
---|---|---|---|
8/31/2021 | Lender1 | 6/20/2025 | 13129474 |
8/31/2021 | Lender1 | 6/20/2025 | 14643881 |
8/31/2021 | Lender1 | 6/20/2025 | 44935677 |
7/31/2021 | Lender1 | 6/30/2022 | 150000 |
12/31/2020 | Lender2 | 5/15/2024 | 2750000 |
You'll need to convert Reporting.date
to Date format, either in a mutate
(like I did) or directly in the filter
.
library(tidyverse)
df %>%
mutate(Reporting.date = as.Date(Reporting.date, format = '%m/%d/%Y')) %>%
group_by(Lender.name, Date.of.maturity, Loan.amount) %>%
filter(Reporting.date == max(Reporting.date)) %>%
ungroup()