Search code examples
python-3.xpandascsvgraphbinance

How to graph Binance API Orderbook with Pandas-matplotlib?


the data comes in 3 columns after (orderbook = pd.DataFrame(orderbook_data):

   timestamp                bids                      asks
 UNIX timestamp      [bidprice, bidvolume]   [askprice, askvolume]
        list has 100 values of each. timestamp is the same 

the problem is that I don't know how to access/index the values inside each row list [price, volume] of each column

I know that by running ---> bids = orderbook["bids"]

I get the list of 100 lists ---> [bidprice, bidvolume]

I'm looking to avoid doing a loop.... there has to be a way to just plot the data

I hope someone can undertand my problem. I just want to plot price on x and volume on y. The goal is to make it live


Solution

  • As you didn't present your input file, I prepared it on my own:

    timestamp;bids
    1579082401;[123.12, 300]
    1579082461;[135.40, 220]
    1579082736;[130.76, 20]
    1579082801;[123.12, 180]
    

    To read it I used:

    orderbook = pd.read_csv('Input.csv', sep=';')
    orderbook.timestamp = pd.to_datetime(orderbook.timestamp, unit='s')
    

    Its content is:

                 timestamp           bids
    0  2020-01-15 10:00:01  [123.12, 300]
    1  2020-01-15 10:01:13  [135.40, 220]
    2  2020-01-15 10:05:36   [130.76, 20]
    3  2020-01-15 10:06:41  [123.12, 180]
    

    Now:

    • timestamp has been converted to native pandasonic type of datetime,
    • but bids is of object type (actually, a string).

    and, as I suppose, this is the same when read from your input file.

    And now the main task: The first step is to extract both numbers from bids, convert them to float and int and save in respective columns:

    orderbook = orderbook.join(orderbook.bids.str.extract(
        r'\[(?P<bidprice>\d+\.\d+), (?P<bidvolume>\d+)]'))
    orderbook.bidprice = orderbook.bidprice.astype(float)
    orderbook.bidvolume = orderbook.bidvolume.astype(int)
    

    Now orderbook contains:

                timestamp           bids  bidprice  bidvolume
    0 2020-01-15 10:00:01  [123.12, 300]    123.12        300
    1 2020-01-15 10:01:01  [135.40, 220]    135.40        220
    2 2020-01-15 10:05:36   [130.76, 20]    130.76         20
    3 2020-01-15 10:06:41  [123.12, 180]    123.12        180
    

    and you can generate e.g. a scatter plot, calling:

    orderbook.plot.scatter('bidprice', 'bidvolume');
    

    or other plotting function.

    Another possibility

    Or maybe your orderbook_data is a dictionary? Something like:

    orderbook_data = {
        'timestamp': [1579082401, 1579082461, 1579082736, 1579082801],
        'bids': [[123.12, 300], [135.40, 220], [130.76, 20], [123.12, 180]] }
    

    In this case, when you create a DataFrame from it, the column types are initially:

    • timestamp - int64,
    • bids - also object, but this time each cell contains a plain pythonic list.

    Then you can also convert timestamp column to datetime just like above.

    But to split bids (a column of lists) into 2 separate columns, you should run:

    orderbook[['bidprice', 'bidvolume']] = pd.DataFrame(orderbook.bids.tolist())
    

    Then you have 2 new columns with respective components of the source column and you can create your graphics jus like above.