Search code examples
google-bigqueryback-testing

Tabulating Profit And Loss For Backtesting using Bigquery


I have this Bigquery dataframe where 1 in long_entry or short_entry represents entering the trade at that time with a long/short position corresponding. While a 1 in long_exit or short_exit means exiting a trade. I would like to have 2 new columns, one called long_pnl which tabulate the PnL generated from individual long trades and another called short_pnl which tabulate the PnL generated from individual short trades.

Only a maximum of 1 trade/position at any point of time for this backtesting.

Below is my dataframe. As we can see, a long trade is entered on 26/2/2019 and closed at 1/3/2019 and the Pnl will be $64.45 while a short trade is entered on 4/3/2019 and closed on 5/3/2019 with a pnl of -$119.11 (loss).

        date    price       long_entry  long_exit   short_entry short_exit
0   24/2/2019   4124.25           0          0           0              0
1   25/2/2019   4130.67           0          0           0              0
2   26/2/2019   4145.67           1          0           0              0
3   27/2/2019   4180.10           0          0           0              0
4   28/2/2019   4200.05           0          0           0              0
5   1/3/2019    4210.12           0          1           0              0
6   2/3/2019    4198.10           0          0           0              0
7   3/3/2019    4210.34           0          0           0              0
8   4/3/2019    4100.12           0          0           1              0
9   5/3/2019    4219.23           0          0           0              1

I hope to have an output like this, with another column for short_pnl:

        date    price       long_entry  long_exit   short_entry short_exit  long_pnl         
0   24/2/2019   4124.25           0          0           0             0    NaN  
1   25/2/2019   4130.67           0          0           0             0    NaN
2   26/2/2019   4145.67           1          0           0             0  64.45
3   27/2/2019   4180.10           0          0           0             0    NaN
4   28/2/2019   4200.05           0          0           0             0    NaN
5   1/3/2019    4210.12           0          1           0             0    NaN
6   2/3/2019    4198.10           0          0           0             0    NaN
7   3/3/2019    4210.34           0          0           0             0    NaN
8   4/3/2019    4100.12           0          0           1             0    NaN
9   5/3/2019    4219.23           0          0           0             1    NaN

Solution

  • Below is for BigQuery Standard SQL

    #standardSQL
    WITH temp1 AS (
      SELECT PARSE_DATE('%d/%m/%Y', dt) dt, CAST(price AS numeric) price, long_entry, long_exit, short_entry, short_exit
      FROM `project.dataset.table`
    ), temp2 AS (
      SELECT dt, price, long_entry, long_exit, short_entry, short_exit,
        SUM(long_entry) OVER(ORDER BY dt) + SUM(long_exit) OVER(ORDER BY dt ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) long_grp,
        SUM(short_entry) OVER(ORDER BY dt) + SUM(short_exit) OVER(ORDER BY dt ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) short_grp
      FROM temp1
    )
    SELECT dt, price, long_entry, long_exit, short_entry, short_exit,
      IF(long_entry = 0, NULL, 
        FIRST_VALUE(price) OVER(PARTITION BY long_grp ORDER BY dt DESC) - 
        LAST_VALUE(price) OVER(PARTITION BY long_grp ORDER BY dt DESC)
      ) long_pnl,
      IF(short_entry = 0, NULL, 
        LAST_VALUE(price) OVER(PARTITION BY short_grp ORDER BY dt DESC) - 
        FIRST_VALUE(price) OVER(PARTITION BY short_grp ORDER BY dt DESC)
      ) short_pnl
    FROM temp2
    

    If to apply above to sample data in your question

    #standardSQL
    WITH `project.dataset.table` AS (
      SELECT '24/2/2019' dt, 4124.25 price, 0 long_entry, 0 long_exit, 0 short_entry, 0 short_exit UNION ALL
      SELECT '25/2/2019', 4130.67, 0, 0, 0, 0 UNION ALL
      SELECT '26/2/2019', 4145.67, 1, 0, 0, 0 UNION ALL
      SELECT '27/2/2019', 4180.10, 0, 0, 0, 0 UNION ALL
      SELECT '28/2/2019', 4200.05, 0, 0, 0, 0 UNION ALL
      SELECT '1/3/2019', 4210.12, 0, 1, 0, 0 UNION ALL
      SELECT '2/3/2019', 4198.10, 0, 0, 0, 0 UNION ALL
      SELECT '3/3/2019', 4210.34, 0, 0, 0, 0 UNION ALL
      SELECT '4/3/2019', 4100.12, 0, 0, 1, 0 UNION ALL
      SELECT '5/3/2019', 4219.23, 0, 0, 0, 1 
    ), temp1 AS (
      SELECT PARSE_DATE('%d/%m/%Y', dt) dt, CAST(price AS numeric) price, long_entry, long_exit, short_entry, short_exit
      FROM `project.dataset.table`
    ), temp2 AS (
      SELECT dt, price, long_entry, long_exit, short_entry, short_exit,
        SUM(long_entry) OVER(ORDER BY dt) + SUM(long_exit) OVER(ORDER BY dt ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) long_grp,
        SUM(short_entry) OVER(ORDER BY dt) + SUM(short_exit) OVER(ORDER BY dt ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) short_grp
      FROM temp1
    )
    SELECT dt, price, long_entry, long_exit, short_entry, short_exit,
      IF(long_entry = 0, NULL, 
        FIRST_VALUE(price) OVER(PARTITION BY long_grp ORDER BY dt DESC) - 
        LAST_VALUE(price) OVER(PARTITION BY long_grp ORDER BY dt DESC)
      ) long_pnl,
      IF(short_entry = 0, NULL, 
        LAST_VALUE(price) OVER(PARTITION BY short_grp ORDER BY dt DESC) - 
        FIRST_VALUE(price) OVER(PARTITION BY short_grp ORDER BY dt DESC)
      ) short_pnl
    FROM temp2
    -- ORDER BY dt
    

    result will be

    Row dt          price   long_entry  long_exit   short_entry short_exit  long_pnl    short_pnl    
    1   2019-02-24  4124.25 0           0           0           0           null        null     
    2   2019-02-25  4130.67 0           0           0           0           null        null     
    3   2019-02-26  4145.67 1           0           0           0           64.45       null     
    4   2019-02-27  4180.1  0           0           0           0           null        null     
    5   2019-02-28  4200.05 0           0           0           0           null        null     
    6   2019-03-01  4210.12 0           1           0           0           null        null     
    7   2019-03-02  4198.1  0           0           0           0           null        null     
    8   2019-03-03  4210.34 0           0           0           0           null        null     
    9   2019-03-04  4100.12 0           0           1           0           null        -119.11  
    10  2019-03-05  4219.23 0           0           0           1           null        null     
    

    I feel there should be a "shorter" solution - but above is still good enough I think to use