Search code examples
pythonpandasgenerator

python generator to pandas dataframe


I have a generator being returned from:

data = public_client.get_product_trades(product_id='BTC-USD', limit=10)

How do i turn the data in to a pandas dataframe?

the method DOCSTRING reads:

"""{"Returns": [{
                     "time": "2014-11-07T22:19:28.578544Z",
                     "trade_id": 74,
                     "price": "10.00000000",
                     "size": "0.01000000",
                     "side": "buy"
                 }, {
                     "time": "2014-11-07T01:08:43.642366Z",
                     "trade_id": 73,
                     "price": "100.00000000",
                     "size": "0.01000000",
                     "side": "sell"
         }]}"""

I have tried:

df = [x for x in data]
df = pd.DataFrame.from_records(df)

but it does not work as i get the error:

AttributeError: 'str' object has no attribute 'keys'

When i print the above "x for x in data" i see the list of dicts but the end looks strange, could this be why?

print(list(data))

[{'time': '2020-12-30T13:04:14.385Z', 'trade_id': 116918468, 'price': '27853.82000000', 'size': '0.00171515', 'side': 'sell'},{'time': '2020-12-30T12:31:24.185Z', 'trade_id': 116915675, 'price': '27683.70000000', 'size': '0.01683711', 'side': 'sell'}, 'message']

It looks to be a list of dicts but the end value is a single string 'message'.


Solution

  • Based on the updated question:

    df = pd.DataFrame(list(data)[:-1])
    

    Or, more cleanly:

    df = pd.DataFrame([x for x in data if isinstance(x, dict)])
    print(df)
    
                           time   trade_id           price        size  side
    0  2020-12-30T13:04:14.385Z  116918468  27853.82000000  0.00171515  sell
    1  2020-12-30T12:31:24.185Z  116915675  27683.70000000  0.01683711  sell
    

    Oh, and BTW, you'll still need to change those strings into something usable...

    So e.g.:

    df['time'] = pd.to_datetime(df['time'])
    for k in ['price', 'size']:
        df[k] = pd.to_numeric(df[k])