Search code examples
rportfoliorebalancing

R: create column to determine wether stock is bought, held or sold


I have a dataset of portfolio components that gives information about the 10 stocks held in my portfolio at each rebalancing date(monthly).

An example dataframe with just 4 dates and 4 stocks:

Date <- c(rep(as.Date("2010/12/14"), 4), rep(as.Date("2011/01/13"), 4), rep(as.Date("2011/02/10"), 4),rep(as.Date("2011/03/10"), 4))

Name <- c("A","B","C","D","A","C","F","G","A","B","F","H","A","F","H","I")

df <- data.frame(Date, Name)

The stocks vary between the dates, so i need to compute direct transaction costs based on wether i buy, hold or sell each stock. What i would like is to add a column with values of 0,1,2 depending on this, so that:

  • value of 0 if the stock is held at time t-1 and time t
  • value of 1 if the stock is held at time t-1, but sold at time t
  • value of 1 if the stock is bought at time t-1, and held(not sold) at time t
  • value of 2 if if the stock is bought at time t-1, and sold at time t

With the values being assigned to the stock at t-1.

example of what this would look like:

Date          Stock      Status
2010-12-14    A          1
2010-12-14    B          2
2010-12-14    C          1
2010-12-14    D          2

2011-01-13    A          0
2011-01-13    C          1       
2011-01-13    F          1
2011-01-13    G          2

2011-02-10    A          0
2011-02-10    B          1
2011-02-10    F          0
2011-02-10    H          1

2011-03-10    A
2011-03-10    F
2011-03-10    H
2011-03-10    I

Solution

  • I think your example data is not enough to show all cases, but this should generally do:

     Date <- c(rep(as.Date("2010/12/14"), 4), rep(as.Date("2011/01/13"), 4), rep(as.Date("2011/02/10"), 4),rep(as.Date("2011/03/10"), 4))
        dummy <- rep(1, 16)
        Name <- c("A","B","C","D","A","C","F","G","A","B","F","H","A","F","H","I")
        df <- data.frame(Date, Name, dummy)
    
        le = LETTERS[1:9]
        temp = CJ(Name=le, Date=unique(Date))
    
        df = data.table(df)
        setkey(df, Name, Date)
        setkey(temp, Name, Date)
        df = df[temp]
    
    
        df[,value:=case_when(dummy==1 & shift(dummy, type = "lag", n = 1)==1 & shift(dummy, type = "lag", n = 2)==1 ~ 0,
                             dummy==0 & shift(dummy, type = "lag", n = 1)==1 & shift(dummy, type = "lag", n = 2)==1 ~ 1,
                             dummy==1 & shift(dummy, type = "lag", n = 1)==1 & shift(dummy, type = "lag", n = 2)==0 ~ 1,
                             dummy==0 & shift(dummy, type = "lag", n = 1)==1 & shift(dummy, type = "lag", n = 2)==0 ~ 2,
                             TRUE~88), by=Name][dummy==1]