I have a dataframe which I would like to clean by removing some offsetting lines (boxed positions) and doing some netting.
Here is the source table:
Type Name Strike Maturity Nominal
Call Amazon 10 10/12/2018 1000
Put Amazon 10 10/12/2018 1000
Call Ebay 8 2/8/2018 800
Put Ebay 8 2/8/2018 500
Call Facebook 5 5/5/2018 900
Call Google 2 23/4/2018 250
Put Google 2 23/4/2018 350
Call Microsoft 2 19/3/2018 250
Put Microsoft 2.5 19/3/2018 350
Put Ebay 8 2/8/2018 100
And the result of the code here:
Type Name Strike Maturity Nominal
Call Ebay 8 2/8/2018 200
Call Facebook 5 5/5/2018 900
Put Google 2 23/4/2018 100
Call Microsoft 2 19/3/2018 250
Put Microsoft 2.5 19/3/2018 350
I'm trying to write a code in R that would perform these 3 tasks:
1// Remove all the pairs that offset each other. A pair that offset each other is a pair that meet these 2 criteria:
Example: the 2 "Amazon" lines that were removed from the table
2// Do a netting on the nominal for the lines that don't perfectly offset each other. A pair that don't perfectly offset each other is a pair that meet these 2 criteria:
Example: the 2 "Ebay" lines that were netted on the Call or the 2 "Google" lines that were netted on the Put.
3// Don't do anything on all the other lines
Example: the 2 "Microsoft" lines. They have different strike so no netting at all should be done
Please see below my first attempt. My idea was first to create a new column with a unique key, then sorting alphabetically and then testing each line one by one. I find it very laborious so I was wondering if someone could help me find a more straightforward and efficient solution? Many thanks!
library(data.table)
dt <- data.table(Type=c("Call", "Put", "Call", "Put", "Call", "Call", "Put", "Call", "Put","Put"),
Name=c("Amazon", "Amazon", "Ebay", "Ebay", "Facebook", "Google", "Google", "Microsoft", "Microsoft","Ebay"),
Strike=c(10,10,8,8,5,2,2,2,2.5,8),
Maturity=c("10/12/2018", "10/12/2018", "2/8/2018", "2/8/2018", "5/5/2018", "23/4/2018", "23/4/2018", "19/3/2018", "19/3/2018","2/8/2018),
Nominal=c(1000,1000,800,500,900,250,350,250,35,100))
##idea
dt$key <- paste(dt$Name,dt$Strike,dt$Maturity)
dt[order(dt$key,decreasing = FALSE),]
dt$Type2 <- ifelse(dt$Type = "Call",1,0)
#for each line k, test value in the column "Key" and the column "Type2":
#if key(k) = key(k+1) and Type2(k)+Type2(k+1)=1 then
#if Nominal (k)> Nominal (k+1), delete the line k+1 and do the netting on nominal of the line k
#else Nomnial (k+1)< Nominal (k), delete the line k and do the netting on nominal of the line k+1
#next k
dt <- dt[dt$Nominal!=0,]
dt$key <- NULL
After ideas that were recommended, I tried the dcast solution but it looks like it does not do the proper netting as shown below:
> dt <- data.table(Type=c("Call", "Put", "Call", "Put", "Call", "Call", "Put", "Call", "Put","Put"),
+ Name=c("Amazon", "Amazon", "Ebay", "Ebay", "Facebook", "Google", "Google", "Microsoft", "Microsoft","Ebay"),
+ Strike=c(10,10,8,8,5,2,2,2,2.5,8),
+ Maturity=c("10/12/2018", "10/12/2018", "2/8/2018", "2/8/2018", "5/5/2018", "23/4/2018", "23/4/2018", "19/3/2018", "19/3/2018","2/8/2018"),
+ Nominal=c(1000,1000,800,500,900,250,350,250,350,100))
> dcast(dt, Name + Maturity + Strike ~ Type, value.var="Nominal", fill = 0)[, Net := Call - Put][Net != 0]
Aggregate function missing, defaulting to 'length'
Name Maturity Strike Call Put Net
1: Ebay 2/8/2018 8.0 1 2 -1
2: Facebook 5/5/2018 5.0 1 0 1
3: Microsoft 19/3/2018 2.0 1 0 1
4: Microsoft 19/3/2018 2.5 0 1 -1
Here is a tidyverse
solution. Basically, since you want to group all rows that have the same Name
, Strike
and Maturity
, I think it's simplest to convert Call
and Put
into actual numbers and use summarise
. Your special offset case is really just removing net cases where the total ends up being 0.
Approach is:
Put
into negative values of Nominal
using ifelse
and mutate
,group_by
and summarise
to reduce the groups into a single value per group`,filter
,Type
column and make the negative values positive.Code:
library(tidyverse)
tbl <- read_table2(
"Type Name Strike Maturity Nominal
Call Amazon 10 10/12/2018 1000
Put Amazon 10 10/12/2018 1000
Call Ebay 8 2/8/2018 800
Put Ebay 8 2/8/2018 500
Call Facebook 5 5/5/2018 900
Call Google 2 23/4/2018 250
Put Google 2 23/4/2018 350
Call Microsoft 2 19/3/2018 250
Put Microsoft 2.5 19/3/2018 350
Put Ebay 8 2/8/2018 100"
)
tbl %>%
mutate(actual = ifelse(Type == "Call", Nominal, -Nominal)) %>%
group_by(Name, Strike, Maturity) %>%
summarise(Net = sum(actual)) %>%
filter(Net != 0) %>%
mutate(
Type = ifelse(Net > 0, "Call", "Put"),
Net = abs(Net)
)
# A tibble: 5 x 5
# Groups: Name, Strike [5]
Name Strike Maturity Net Type
<chr> <dbl> <chr> <int> <chr>
1 Ebay 8.00 2/8/2018 200 Call
2 Facebook 5.00 5/5/2018 900 Call
3 Google 2.00 23/4/2018 100 Put
4 Microsoft 2.00 19/3/2018 250 Call
5 Microsoft 2.50 19/3/2018 350 Put