Search code examples
pythonpandaspytables

Pandas HDFStore Tables doesn't accept multiindex columns


This works fine:

cols = ['X', 'Y']
ind = [('A', 1), ('B', 2)]
ind = pd.MultiIndex.from_tuples(index, names=['foo', 'number'])

df = pd.DataFrame(rand(2,2), columns = cols, index=ind)
store.put('df', df, table=True)
print store['df']

               X         Y
foo number                    
A   1       0.015005  0.213427
B   2       0.090311  0.595418

This breaks:

cols = [('X', 1), ('Y', 2)]
cols = pd.MultiIndex.from_tuples(index, names=['bar', 'number'])
ind = [('A', 1), ('B', 2)]
ind = pd.MultiIndex.from_tuples(index, names=['foo', 'number'])

df = pd.DataFrame(rand(2,2), columns = cols, index=ind)
store.put('df', df, table=True)
print store['df']

KeyError: u'no item named foo'

I suspect this is a known limitation of using PyTables, but I couldn't find any reference in the Pandas docs that the multiindex is in fact restricted to the index, not the columns.


Solution

  • This is not supported, e.g. BOTH a column-multi-index and an index multi-index. Either one alone works. However, in general a column multi-index is not very useful as its impossible to select from it with out some really odd syntax (the columns are stored as tuples, so they have to be explicity selected). So I wouldn't recommend it in any event.

    I'll open an issue to support both, as it current raises, in any event, see here: https://github.com/pydata/pandas/issues/5823