Search code examples
pandasohlcohlcv

Combining OHLCV price series of multiple trading pairs by volume-weighting


I have multiple trading pairs for the same asset in the familiar OHLCV format. Data is indexed by (pair, timestamp) tuples and contains OHLCV columns.

How can I calculate volume-weighted average prices and total volumes across all trading pairs in a single DataFrame?


Solution

  • To get the maximum performance you can do

    • Calculate first the total volume for each timestamp
    • Calculate weight for each pair entry in a timestamp based on its volume
    • Based on weights, calculate the resulting open/high/low/close

    Example:

    def calculate_volume_weighted_ohlcv(df: pd.DataFrame) -> pd.DataFrame:
        """Calculate volume weighted average prices (vwap) on OHLCV.
    
        - Assume input has multiple entries for each timestamp (index) separated by column "pair_id"
    
        - open, high, low, close columns are weighted by volume column
    
        - volume is the sum of all volume by timestamp
    
        - liquidity, if in the columns, is the sum of all liquidity by timestamp
    
        .. code-block:: text
    
                        pair_id  open  high  low  close  volume  liquidity
            timestamp
            2020-01-01        1   100   100  100    100     500         10
            2020-02-02        1   100   100  100    100     500         10
            2020-01-01        2   110   110  110    110     250         20
            2020-02-02        2   110   110  110    110     250         20
            2020-02-02        3   200   200  200    200    1000         30
    
    
        :param df:
            Must have columns pair_id, timestamp, open, high, low, close, volume and optionally liquidity
    
        :return:
            Aggregated open, high, low, close, volume, liquidity
        """
    
        assert isinstance(df.index, pd.DatetimeIndex)
    
        timestamp_agg = df.groupby(level='timestamp')
    
        # Calculate 0..1 weight for each (pair_id, timestamp) combo
        df["total_volume_in_timestamp"] = timestamp_agg.agg("volume").sum()
        df["weight"] = df["volume"] / df["total_volume_in_timestamp"]
    
        result_df = pd.DataFrame()
        df["open_weighted"] = (df["open"] * df["weight"])
        df["high_weighted"] = (df["high"] * df["weight"])
        df["low_weighted"] = (df["low"] * df["weight"])
        df["close_weighted"] = (df["close"] * df["weight"])
    
        grouped_2 = df.groupby("timestamp")
        result_df["open"] = grouped_2["open_weighted"].sum()
        result_df["high"] = grouped_2["high_weighted"].sum()
        result_df["low"] = grouped_2["low_weighted"].sum()
        result_df["close"] = grouped_2["close_weighted"].sum()
        result_df["volume"] = grouped_2["volume"].sum()
        if "liquidity" in df.columns:
            result_df["liquidity"] = grouped_2["liquidity"].sum()
        return result_df