Search code examples
pythonpandasdataframelistnamedtuple

How do I create pandas DataFrame (with index or multiindex) from list of namedtuple instances?


Simple example:

from collections import namedtuple
import pandas

Price = namedtuple('Price', 'ticker date price')
a = Price('GE', '2010-01-01', 30.00)
b = Price('GE', '2010-01-02', 31.00)
l = [a, b]
df = pandas.DataFrame.from_records(l, index='ticker')
Traceback (most recent call last)
...
KeyError: 'ticker'

Harder example:

df2 = pandas.DataFrame.from_records(l, index=['ticker', 'date'])
df2

         0           1   2
ticker  GE  2010-01-01  30
date    GE  2010-01-02  31

Now it thinks that ['ticker', 'date'] is the index itself, rather than the columns I want to use as the index.

Is there a way to do this without resorting to an intermediate numpy ndarray or using set_index after the fact?


Solution

  • To get a Series from a namedtuple you could use the _fields attribute:

    In [11]: pd.Series(a, a._fields)
    Out[11]:
    ticker            GE
    date      2010-01-01
    price             30
    dtype: object
    

    Similarly you can create a DataFrame like this:

    In [12]: df = pd.DataFrame(l, columns=l[0]._fields)
    
    In [13]: df
    Out[13]:
      ticker        date  price
    0     GE  2010-01-01     30
    1     GE  2010-01-02     31
    

    You have to set_index after the fact, but you can do this inplace:

    In [14]: df.set_index(['ticker', 'date'], inplace=True)
    
    In [15]: df
    Out[15]:
                       price
    ticker date
    GE     2010-01-01     30
           2010-01-02     31