Search code examples
rfor-loopdplyrtidyversefinance

Trying to add a column of logicals by looping over time in a tibble in R


I started replicating the value factor used by Fama and French in to build a portfolio strategy for my final dissertation.

I have a dataset of monthly market caps from the S&P 500 over the years. I created a loop to determine whether the variable (mkt cap) of a determined observation at a certain date is higher or lower than a certain threshold computed cross-sectionally at the same time (across all the observations of the variable mkt cap at time t). To achieve this, I thought the appropriate technique to be a for loop. In this way for each date I calculate the threshold and check the criteria. Unfortunately I am not able to store the logical during the loop. When I print the results I can see what I would like to store but when I try to store I get only the results related to the last step of the loop.

for(d in dates$date){
  month <- data_tbk %>% filter(date==d) 
  up <- quantile(month$mktcap, 0.8, na.rm=TRUE)
  low <- quantile(month$mktcap, 0.2, na.rm=TRUE)
  data_tbk %>% filter(date==d) %>%
    mutate(ptf=ifelse(mktcap>=up,1,ifelse(mktcap<=low,0,NA))) %>%
    print
    }

Another way I tried to pursue is the following but I got even less:

data_tbk$ptf <- NA
for(d in dates$date){
  month <- data_tbk %>% filter(date==d)
  up <- quantile(month$mktcap, 0.8, na.rm=TRUE)
  low <- quantile(month$mktcap, 0.2, na.rm=TRUE)
  data_tbk %>% filter(date==d) %>% filter(mktcap>=up) %>% ptf=1
  filter(data_tbk, date==d) %>% filter(mktcap<=low) %>% ptf=0
  }

How should I change the codes to get a column containing the logical 1 or 0 according to the criteria?

data_pdk


Solution

  • You won't need a loop.

    Assuming your dataframe is data_tbk, this code will create new variable is_higher. 1 if mktcap more then Q80%, 0 if less than Q20%, and NA for the rest.

    library(dplyr)
    
    data_tbk  <- data_tbk %>% 
        mutate(is_higher = case_when(
                            mktcap >  quantile(mktcap,0.8) ~ 1,
                            mktcap <= quantile(mktcap,0.2) ~ 0,
                            TRUE ~ NA) )
    

    If you expect to calculate quantile per date, then add group_by clause.

    data_tbk <- data_tbk %>% 
        group_by(date) %>%
        mutate(is_higher = case_when(
                            mktcap >  quantile(mktcap,0.8) ~ 1,
                            mktcap <= quantile(mktcap,0.2) ~ 0,
                            TRUE ~ NA) )
    

    PS. You need to install package dplyr

    install.package("dplyr")