Search code examples
rdataframesubtotal

Netting in a dataframe


I have a dataframe which I would like to clean by removing some offsetting lines (boxed positions) and doing some netting.

Here is the source table:

    Type  Name     Strike  Maturity    Nominal
    Call  Amazon    10     10/12/2018  1000
    Put   Amazon    10     10/12/2018  1000
    Call  Ebay      8      2/8/2018    800
    Put   Ebay      8      2/8/2018    500
    Call  Facebook  5      5/5/2018    900
    Call  Google    2      23/4/2018   250
    Put   Google    2      23/4/2018   350
    Call  Microsoft 2      19/3/2018   250
    Put   Microsoft 2.5    19/3/2018   350
    Put   Ebay      8      2/8/2018    100

And the result of the code here:

    Type  Name      Strike  Maturity   Nominal
    Call  Ebay      8       2/8/2018   200
    Call  Facebook  5       5/5/2018   900
    Put   Google    2       23/4/2018  100
    Call  Microsoft 2       19/3/2018  250
    Put   Microsoft 2.5     19/3/2018  350

I'm trying to write a code in R that would perform these 3 tasks:

1// Remove all the pairs that offset each other. A pair that offset each other is a pair that meet these 2 criteria:

  • 2 lines that have the same Name, Strike, Maturity and Nominal.
  • 1 line is a "Call" while the other one is a "Put"

Example: the 2 "Amazon" lines that were removed from the table

2// Do a netting on the nominal for the lines that don't perfectly offset each other. A pair that don't perfectly offset each other is a pair that meet these 2 criteria:

  • 2 lines that have the same Name, Strike and Maturity but different Nominal
  • 1 line is a "Call" while the other one is a "Put"

Example: the 2 "Ebay" lines that were netted on the Call or the 2 "Google" lines that were netted on the Put.

3// Don't do anything on all the other lines

Example: the 2 "Microsoft" lines. They have different strike so no netting at all should be done

Please see below my first attempt. My idea was first to create a new column with a unique key, then sorting alphabetically and then testing each line one by one. I find it very laborious so I was wondering if someone could help me find a more straightforward and efficient solution? Many thanks!

library(data.table)

dt <- data.table(Type=c("Call", "Put", "Call", "Put", "Call", "Call", "Put", "Call", "Put","Put"),
                 Name=c("Amazon", "Amazon", "Ebay", "Ebay", "Facebook", "Google", "Google", "Microsoft", "Microsoft","Ebay"),
                 Strike=c(10,10,8,8,5,2,2,2,2.5,8),
                 Maturity=c("10/12/2018", "10/12/2018", "2/8/2018", "2/8/2018", "5/5/2018", "23/4/2018", "23/4/2018", "19/3/2018", "19/3/2018","2/8/2018),
                 Nominal=c(1000,1000,800,500,900,250,350,250,35,100))

##idea
dt$key <- paste(dt$Name,dt$Strike,dt$Maturity)
dt[order(dt$key,decreasing = FALSE),]
dt$Type2 <- ifelse(dt$Type = "Call",1,0)

#for each line k, test value in the column "Key" and the column "Type2":
#if key(k) = key(k+1) and Type2(k)+Type2(k+1)=1 then 
    #if Nominal (k)> Nominal (k+1), delete the line k+1 and do the netting on nominal of the line k
    #else Nomnial (k+1)< Nominal (k), delete the line k and do the netting on nominal of the line k+1
#next k

dt <- dt[dt$Nominal!=0,]
dt$key <- NULL

After ideas that were recommended, I tried the dcast solution but it looks like it does not do the proper netting as shown below:

> dt <- data.table(Type=c("Call", "Put", "Call", "Put", "Call", "Call", "Put", "Call", "Put","Put"),
+                  Name=c("Amazon", "Amazon", "Ebay", "Ebay", "Facebook", "Google", "Google", "Microsoft", "Microsoft","Ebay"),
+                  Strike=c(10,10,8,8,5,2,2,2,2.5,8),
+                  Maturity=c("10/12/2018", "10/12/2018", "2/8/2018", "2/8/2018", "5/5/2018", "23/4/2018", "23/4/2018", "19/3/2018", "19/3/2018","2/8/2018"),
+                  Nominal=c(1000,1000,800,500,900,250,350,250,350,100))
> dcast(dt, Name + Maturity + Strike ~ Type, value.var="Nominal", fill = 0)[, Net := Call - Put][Net != 0]
Aggregate function missing, defaulting to 'length'
        Name  Maturity Strike Call Put Net
1:      Ebay  2/8/2018    8.0    1   2  -1
2:  Facebook  5/5/2018    5.0    1   0   1
3: Microsoft 19/3/2018    2.0    1   0   1
4: Microsoft 19/3/2018    2.5    0   1  -1

Solution

  • Here is a tidyverse solution. Basically, since you want to group all rows that have the same Name, Strike and Maturity, I think it's simplest to convert Call and Put into actual numbers and use summarise. Your special offset case is really just removing net cases where the total ends up being 0.

    Approach is:

    1. Convert Put into negative values of Nominal using ifelse and mutate,
    2. Use group_by and summarise to reduce the groups into a single value per group`,
    3. Remove perfect offsets with filter,
    4. Replace the Type column and make the negative values positive.

    Code:

    library(tidyverse)
    tbl <- read_table2(
      "Type  Name     Strike  Maturity    Nominal
      Call  Amazon    10     10/12/2018  1000
      Put   Amazon    10     10/12/2018  1000
      Call  Ebay      8      2/8/2018    800
      Put   Ebay      8      2/8/2018    500
      Call  Facebook  5      5/5/2018    900
      Call  Google    2      23/4/2018   250
      Put   Google    2      23/4/2018   350
      Call  Microsoft 2      19/3/2018   250
      Put   Microsoft 2.5    19/3/2018   350
      Put   Ebay      8      2/8/2018    100"
    )
    
    tbl %>%
      mutate(actual = ifelse(Type == "Call", Nominal, -Nominal)) %>%
      group_by(Name, Strike, Maturity) %>%
      summarise(Net = sum(actual)) %>%
      filter(Net != 0) %>%
      mutate(
        Type = ifelse(Net > 0, "Call", "Put"),
        Net = abs(Net)
        )
    # A tibble: 5 x 5
    # Groups:   Name, Strike [5]
      Name      Strike Maturity    Net Type 
      <chr>      <dbl> <chr>     <int> <chr>
    1 Ebay        8.00 2/8/2018    200 Call 
    2 Facebook    5.00 5/5/2018    900 Call 
    3 Google      2.00 23/4/2018   100 Put  
    4 Microsoft   2.00 19/3/2018   250 Call 
    5 Microsoft   2.50 19/3/2018   350 Put