Search code examples
pythonpandasevalalgorithmic-tradingccxt

Pandas: how to divide all values in one DataFrame by the values in another DataFrame?


[EDIT] I actually figured this out, and you can see my answer below. ]

Before you reference the other question with almost the same title, I have already looked at it and tried out their solution without luck. I'm not sure exactly what I'm doing wrong, but I'm getting a different error than what was in the other question.

What I'm trying to do is create an artificial trading pair based on two underlying trading pairs. This is for pairs trading.

Eg. 'ETHUSD/BTCUSD'

It's possible to see charts like this on TradingView and I'm trying to do the same calculation locally.

So, I have the OHLCV data for ETHUSD and for BTCUSD saved in two DataFrames. I'm trying to merge them and divide each OHLC value from the base pair (eg. 'ETHUSD' by the second pair eg 'BTCUSD'

Here is what I've written so far which gives me a TypeError:

def create_synthetic_pair(base, quote, timeframe, limit):
    """
      This is not working yet. 
    """
    base_bars = exchange.fetch_ohlcv(base, timeframe, limit)
    quote_bars = exchange.fetch_ohlcv(quote, timeframe, limit)

    df_base = pd.DataFrame(base_bars[:-1], columns=['timestamp', 'open', 'high', 'low', 'close', 'volume'])
    df_base['timestamp'] = pd.to_datetime(df_base['timestamp'], unit='ms')
    
    df_quote = pd.DataFrame(quote_bars[:-1], columns=['timestamp', 'open', 'high', 'low', 'close', 'volume'])
    df_quote['timestamp'] = pd.to_datetime(df_quote['timestamp'], unit='ms')
    
    df_synth = (df_base.merge(df_quote, on='timestamp', how='left', suffixes=['_base', '_quote']) #how='left', sort=False)
                  .eval("""
                        open=open_base/open_quote
                        high=high_base/high_quote
                        low=low_base/low_quote
                        close=close_base/close_quote
                        """)
                )
    
    return df_synth

When I run that code I get the following error:

File "/home/jonathon/Developer/*****/*****/*****/main.py", line 105, in run_bot
    create_synthetic_pair('ETH/USDT', 'BTC/USDT', "1m", 100)
  File "/home/jonathon/Developer/*****/*****/*****/main.py", line 131, in create_synthetic_pair
    .eval("""
  File "/home/jonathon/.local/share/virtualenvs/*****-r-7MCNdX/lib/python3.10/site-packages/pandas/core/frame.py", line 4234, in eval
    return _eval(expr, inplace=inplace, **kwargs)
  File "/home/jonathon/.local/share/virtualenvs/*****-r-7MCNdX/lib/python3.10/site-packages/pandas/core/computation/eval.py", line 350, in eval
    parsed_expr = Expr(expr, engine=engine, parser=parser, env=env)
  File "/home/jonathon/.local/share/virtualenvs/*****-r-7MCNdX/lib/python3.10/site-packages/pandas/core/computation/expr.py", line 811, in __init__
    self.terms = self.parse()
  File "/home/jonathon/.local/share/virtualenvs/*****-r-7MCNdX/lib/python3.10/site-packages/pandas/core/computation/expr.py", line 830, in parse
    return self._visitor.visit(self.expr)
  File "/home/jonathon/.local/share/virtualenvs/*****-r-7MCNdX/lib/python3.10/site-packages/pandas/core/computation/expr.py", line 415, in visit
    return visitor(node, **kwargs)
  File "/home/jonathon/.local/share/virtualenvs/*****-r-7MCNdX/lib/python3.10/site-packages/pandas/core/computation/expr.py", line 421, in visit_Module
    return self.visit(expr, **kwargs)
  File "/home/jonathon/.local/share/virtualenvs/*****-r-7MCNdX/lib/python3.10/site-packages/pandas/core/computation/expr.py", line 415, in visit
    return visitor(node, **kwargs)
  File "/home/jonathon/.local/share/virtualenvs/*****-r-7MCNdX/lib/python3.10/site-packages/pandas/core/computation/expr.py", line 636, in visit_Assign
    return self.visit(node.value, **kwargs)
  File "/home/jonathon/.local/share/virtualenvs/*****-r-7MCNdX/lib/python3.10/site-packages/pandas/core/computation/expr.py", line 415, in visit
    return visitor(node, **kwargs)
  File "/home/jonathon/.local/share/virtualenvs/*****-r-7MCNdX/lib/python3.10/site-packages/pandas/core/computation/expr.py", line 538, in visit_BinOp
    return self._maybe_evaluate_binop(op, op_class, left, right)
  File "/home/jonathon/.local/share/virtualenvs/*****-r-7MCNdX/lib/python3.10/site-packages/pandas/core/computation/expr.py", line 505, in _maybe_evaluate_binop
    res = op(lhs, rhs)
  File "/home/jonathon/.local/share/virtualenvs/*****-r-7MCNdX/lib/python3.10/site-packages/pandas/core/computation/expr.py", line 541, in <lambda>
    return lambda lhs, rhs: Div(lhs, rhs)
  File "/home/jonathon/.local/share/virtualenvs/*****-r-7MCNdX/lib/python3.10/site-packages/pandas/core/computation/ops.py", line 537, in __init__
    raise TypeError(
TypeError: unsupported operand type(s) for /: 'object' and 'object'

I don't understand why I'm getting a type error when these should be float variables, not objects

What am I missing?


Solution

  • So, I actually figured this out on my own. I'll share my answer in case anybody else is trying to do the same thing.

    Actually, the TypeError was a result of having an empty DataFrame, which was nothing to do with pandas and happened because I was incorrectly calling the exchange API.

    For anyone interested, I'm using ccxt, and I changed the beginning from this:

    def create_synthetic_pair(base, quote, timeframe, limit):
    
        base_bars = exchange.fetch_ohlcv(base, timeframe, limit)
        quote_bars = exchange.fetch_ohlcv(quote, timeframe, limit)
    

    to:

    def create_synthetic_pair(base, quote, _timeframe, _limit):
    
        base_bars = exchange.fetch_ohlcv(base, timeframe=_timeframe, limit=_limit)
        quote_bars = exchange.fetch_ohlcv(quote, timeframe=_timeframe, limit=_limit)
    

    That fixed the TypeError. However, it still wasn't working properly, and bizarrely it worked on each of the columns except for the 'close' column. I ended up changing this part of the function:

    df_synth = (df_base.merge(df_quote, on='timestamp', how='left', suffixes=['_base', '_quote']) #how='left', sort=False)
                      .eval("""
                            open=open_base/open_quote
                            high=high_base/high_quote
                            low=low_base/low_quote
                            close=close_base/close_quote
                            """)
                    )
    

    into:

    df_synth = (df_base.merge(df_quote, on='timestamp', how='left', suffixes=['_base', '_quote'], sort=False)
                      .pipe(lambda x: x.assign(open=x.open_base/x.open_quote, 
                                               high=x.high_base/x.high_quote,
                                               low=x.low_base/x.low_quote,
                                               close=x.close_base/x.close_quote,
                                               volume=x.volume_base/x.volume_quote))
                    )
    

    Then I ended up with the correct OHLCV data, but with a bunch of extra unwanted columns from the base and quote DataFrames.

    To remove those columns, I added this underneath:

    for name in df_synth.iteritems():
          if name[0].endswith('_base') or name[0].endswith('_quote'):
            df_synth = df_synth.drop(name[0] , 1)
    

    And now it works!

    However, I actually still didn't get it to work with .eval() and instead used .pipe(), so if anyone wants to comment on how it's possible to do the same with .eval() then that would be welcome, since that's probably a more elegant solution.

    I think I will change the title of this to "Pandas: How to make a synthetic trading pair OHLCV from two underlying trading pairs"

    (If you can think of a less verbose title which still captures that meaning, let me know)

    Don't know if anyone will need this, but the final working function I came up with is:

    def create_synthetic_pair(base_bars, quote_bars): 
        """
          This takes raw kline data from calling
          ccxt.exchange.fetch_ohlcv() as inputs,
          turns them into DataFrames, then divides 
          each base OHLCV data point value by its 
          corresponding quote value, and returns
          a new DataFrame with the new OHLCV values.
        """
        global exchange
    
        df_base = pd.DataFrame(base_bars[:-1], columns=['timestamp', 'open', 'high', 'low', 'close', 'volume'])
        df_base['timestamp'] = pd.to_datetime(df_base['timestamp'], unit='ms')
        
        df_quote = pd.DataFrame(quote_bars[:-1], columns=['timestamp', 'open', 'high', 'low', 'close', 'volume'])
        df_quote['timestamp'] = pd.to_datetime(df_quote['timestamp'], unit='ms')
    
        df_synth = (df_base.merge(df_quote, on='timestamp', how='left', suffixes=['_base', '_quote'], sort=False)
                      .pipe(lambda x: x.assign(open=x.open_base/x.open_quote, 
                                               high=x.high_base/x.high_quote,
                                               low=x.low_base/x.low_quote,
                                               close=x.close_base/x.close_quote,
                                               volume=x.volume_base/x.volume_quote))
                    )
        for name in df_synth.iteritems():
          if name[0].endswith('_base') or name[0].endswith('_quote'):
            df_synth = df_synth.drop(name[0] , 1)
    
        print("\n------------------------------------------------\n"
              +f"[df_base]\n{df_base}"
              +"\n------------------------------------------------\n"
              +f"[df_quote]\n{df_quote}"
              +"\n------------------------------------------------\n"
              +f"[df_synth]\n{df_synth}"
              +"\n------------------------------------------------\n")
        return df_synth