Search code examples
rdataframereplacelookup

Removing values depending on lookup condition


I have the following kind of dataframe, with thousands of columns and rows. First column contains dates, and the following columns contain asset return index corresponding to that date.

DATE Asset_1 Asset_2 Asset_3 Asset_4
1/1/2000 1000 300 2900 NA
2/1/2000 1100 350 2950 NA
3/1/2000 1200 330 2970 100
4/1/2000 1200 360 3000 200
5/1/2000 1200 370 3500 300

My problem is that delisted assets, such as Asset_1 in my example, remain in my dataset after the delisting, distorting my further processing. I have another dataframe where the delisting dates of the assets are listed. For example:

ASSET_CODE DELIST_DATE
Asset_1 3/1/2000
Asset_2 NA
Asset_3 NA
Asset_4 NA

What i want to achieve, is to remove all observations from delisted assets after the delisting date, i.e. if DATE>DELIST_DATE(ASSET_CODE). This would be quite straightforward in Excel using a combination of IF and indexmatching, but can't figure how to do this in R.

My desired output would be:

DATE Asset_1 Asset_2 Asset_3 Asset_4
1/1/2000 1000 300 2900 NA
2/1/2000 1100 350 2950 NA
3/1/2000 1200 330 2970 100
4/1/2000 NA 360 300 200
5/1/2000 NA 370 350 300

Solution

  • You can do this with a join + filter. But you have two add'l steps that need to be taken care of.

    1. convert the dates to a Date data format so they can be compared chronologically. While Excel famously likes to make everything into a date, R needs to be told that your character string represents a date and shouldn't be sorted "alphabetically."

    2. For a join, it's easier if your data is in long format.


    4 of the seven steps below relate to the dates and reshaping.

    library(dplyr)
    asset_returns %>%
      mutate(DATE = as.Date(DATE, "%m/%d/%Y")) %>%                     # DATES
      tidyr::pivot_longer(-DATE, names_to = "ASSET_CODE") %>%          # RESHAPE
      left_join(delistings) %>%
      mutate(DELIST_DATE = as.Date(DELIST_DATE, "%m/%d/%Y")) %>%       # DATES
      filter(DATE <= DELIST_DATE | is.na(DELIST_DATE)) %>%
      select(-DELIST_DATE) %>%
      tidyr::pivot_wider(names_from = ASSET_CODE, values_from = value) # RESHAPE
    

    Result

    Joining, by = "ASSET_CODE"
    # A tibble: 5 × 5
      DATE       Asset_1 Asset_2 Asset_3 Asset_4
      <date>       <int>   <int>   <int>   <int>
    1 2000-01-01    1000     300    2900      NA
    2 2000-02-01    1100     350    2950      NA
    3 2000-03-01    1200     330    2970     100
    4 2000-04-01      NA     360    3000     200
    5 2000-05-01      NA     370    3500     300
    

    Reproducible data

    asset_returns <- data.frame(
      stringsAsFactors = FALSE,
      DATE = c("1/1/2000","2/1/2000",
               "3/1/2000","4/1/2000","5/1/2000"),
      Asset_1 = c(1000L, 1100L, 1200L, 1200L, 1200L),
      Asset_2 = c(300L, 350L, 330L, 360L, 370L),
      Asset_3 = c(2900L, 2950L, 2970L, 3000L, 3500L),
      Asset_4 = c(NA, NA, 100L, 200L, 300L)
    )
    
    delistings <- data.frame(
      stringsAsFactors = FALSE,
            ASSET_CODE = c("Asset_1", "Asset_2", "Asset_3", "Asset_4"),
           DELIST_DATE = c("3/1/2000", NA, NA, NA)
                  )