Search code examples
rif-statementdplyrsummarize

Summarize dataframe with a constraint using Dplyr


I have a dataframe similar to the one below:

data <- data.frame(x = c("0", "2", "8", "1", "7", "10", "15", "14", "13", "11"),
                   y = c("11", "5", "14", "9", "13", "7", "4", "0", "12", "8"),
                   act_x = c("Short", "Buy", "Short", "Buy", "Short", "Buy", "Short", "Buy", "Short", "Buy"),
                   act_y = c("Buy", "Short", "Buy", "Short", "Buy", "Short", "Buy", "Short", "Buy", "Short"))

I wish to create a profits column for x and a profits column for y based on the action taken for x and y. The result should look something like this:

res <- data.frame(data,
                  prof_x = c(NA, -2, 6, 7, 6, -3, 5, 1, -1, 2),
                  prof_y = c(NA, -6, -9, -5, -4, -6, 3, -4, -12, -4))

For example, starting at day 0 (the first line), I short x and buy y. The corresponding prices move and settle at day 1 (the second line). The profit of x is 0-2=-2 (since I shorted x) and the profit for y is 5-11=-6 (since I bought y). And so on...

Is there a friendly way to implement this in the Dplyr pipe? Does anyone have any suggestions outside of the pipe? Thanks in advance for any guidance.


Solution

  • A dplyr based solution using lag and mutate can be achieved as:

    library(dplyr)
    data %>% mutate(x = as.numeric(x), y = as.numeric(y)) %>%
      mutate(prof_x = ifelse(act_x == "Buy", lag(x)-x, x-lag(x))) %>%
      mutate(prof_y = ifelse(act_y == "Buy", lag(y)-y, y-lag(y)))
    # 
    # x  y act_x act_y prof_x prof_y
    # 1   0 11 Short   Buy     NA     NA
    # 2   2  5   Buy Short     -2     -6
    # 3   8 14 Short   Buy      6     -9
    # 4   1  9   Buy Short      7     -5
    # 5   7 13 Short   Buy      6     -4
    # 6  10  7   Buy Short     -3     -6
    # 7  15  4 Short   Buy      5      3
    # 8  14  0   Buy Short      1     -4
    # 9  13 12 Short   Buy     -1    -12
    # 10 11  8   Buy Short      2     -4
    

    Data:

    data <- data.frame(x = c("0", "2", "8", "1", "7", "10", "15", "14", "13", "11"),
            y = c("11", "5", "14", "9", "13", "7", "4", "0", "12", "8"),
            act_x = c("Short", "Buy", "Short", "Buy", "Short", "Buy", "Short", "Buy", "Short", "Buy"),
            act_y = c("Buy", "Short", "Buy", "Short", "Buy", "Short", "Buy", "Short", "Buy", "Short"),
            stringsAsFactors = FALSE)