Search code examples
pythonpandasipython

pandas version 0.16.0 after changing dataframe index all values become NaN


I am using ipython notebook and following pandas cookbook examples release 0.16.0. I have troubles when I am on page 237. I made a dataframe like this

from pandas import *
data1=DataFrame({'AAA':[4,5,6,7],'BBB':[10,20,30,40],'CCC':[100,50,-30,-50]})

then, i did this, trying to change the index:

df=DataFrame(data=data1,index=(['a','b','c','d']))

but what i get is a dataframe with all values being NaN! Anyone knows why and how to fix it? I also tried to use set_index function, and it gave me errors.

Thank you very much! enter image description here


Solution

  • If you want to change the index then either use reindex or assign directly to the index:

    In [5]:
    
    data1=pd.DataFrame({'AAA':[4,5,6,7],'BBB':[10,20,30,40],'CCC':[100,50,-30,-50]})
    print(data1)
    df=pd.DataFrame(data=data1)
    df.index = ['a','b','c','d']
    df
       AAA  BBB  CCC
    0    4   10  100
    1    5   20   50
    2    6   30  -30
    3    7   40  -50
    Out[5]:
       AAA  BBB  CCC
    a    4   10  100
    b    5   20   50
    c    6   30  -30
    d    7   40  -50
    

    I don't know if it is a bug or not but if you did the following then it would work:

    In [7]:
    
    df=pd.DataFrame(data=data1.values,index=(['a','b','c','d']))
    df
    Out[7]:
       0   1    2
    a  4  10  100
    b  5  20   50
    c  6  30  -30
    d  7  40  -50
    

    So if you assigned the data to the values rather than the df itself then the df does not try to align to the passed in index

    EDIT

    After stepping through the code here, the issue is that it's using the passed index to reindex the df, we can reproduce this behaviour by doing the following:

    In [46]:
    
    data1 = pd.DataFrame({'AAA':[4,5,6,7],'BBB':[10,20,30,40],'CCC':[100,50,-30,-50]})
    data1.reindex_axis(list('abcd'))
    Out[46]:
       AAA  BBB  CCC
    a  NaN  NaN  NaN
    b  NaN  NaN  NaN
    c  NaN  NaN  NaN
    d  NaN  NaN  NaN
    

    This is because it enters the df constructor detects it is an instance of BlockManager and tries to construct a df:

    Stepping through the code I see that it reaches here in frame.py:

            if isinstance(data, BlockManager):
            mgr = self._init_mgr(data, axes=dict(index=index, columns=columns),
                                 dtype=dtype, copy=copy)
    

    and then ends up here in generic.py:

    119         def _init_mgr(self, mgr, axes=None, dtype=None, copy=False):
    120             """ passed a manager and a axes dict """
    121             for a, axe in axes.items():
    122                 if axe is not None:
    123                     mgr = mgr.reindex_axis(
    124  ->                     axe, axis=self._get_block_manager_axis(a), copy=False)
    

    An issue has now been posted about this

    Update this is expected behaviour, if you pass the index then it will use this index to reindex against the passed in df, from @Jeff

    This is the defined behavior, to reindex the provided input to the passed index and/or columns .

    See related Issue