I have hourly price data in dataframe in which I need to subtract all permutations to find the best pairing for a financial trade. Each column (not including pricedate, hour) can be considered the closing price of that stock for that particular pricedate and hour. This is the data:
test <- data.frame(pricedate = as.Date('2019-12-18'), hour = c(1,2,3,4,5), A = c(3,5,6,4,2), B = c(5,3,2,6,7), C = c(1,2,3,6,9))
I want to get a new dataframe (or table) of the difference between all permutation combinations. Therefore, "A subtract B" is different than "B subtract A". And I don't need a column subtracted from itself. The resulting table would look something like this:
Pricedate Hour A-B A-C B-A B-C C-A C-B
2019-12-18 1 -2 2 2 4 -2 -4
2019-12-18 2 2 3 -2 1 -3 -1
.
.
.
I believe I need the data to stay in this form because I'd like to calculate some financial statistics in R after this.
Here's a tidyverse approach. First, we convert to longer form, where each column A:C is represented in a new row, and the designation of which column it came from goes in a new column called "col". Then we join that table to itself, so every row is combined with all the rows which share the same date and hour.
Then we calc the difference, filter out the rows where rows are subtracted from themselves, unite the two column headers into a single identifying column, and convert back to wide format.
library(tidyverse)
test_longer <- test %>%
pivot_longer(A:C, names_to = "col", values_to = "val")
test_longer %>%
left_join(test_longer,
suffix = c("1", "2"),
by = c("pricedate", "hour")) %>%
filter(col1 != col2) %>%
mutate(dif = val1 - val2) %>%
unite("col", c(col1, col2), sep = "-") %>%
select(-c(val1, val2)) %>%
pivot_wider(names_from = col, values_from = dif)
# A tibble: 5 x 8
pricedate hour `A-B` `A-C` `B-A` `B-C` `C-A` `C-B`
<date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 2019-12-18 1 -2 2 2 4 -2 -4
2 2019-12-18 2 2 3 -2 1 -3 -1
3 2019-12-18 3 4 3 -4 -1 -3 1
4 2019-12-18 4 -2 -2 2 0 2 0
5 2019-12-18 5 -5 -7 5 -2 7 2