Search code examples
pythonpandasdataframeinsertmulti-index

pandas multi-index dataframe - data inserted to new column and sub-row only visible in column view?


I am trying to insert a list of data into a multi-level pandas dataframe. It seems to work just fine, but when I view the entire dataframe, the new sub-row is not there. Here is an example:

Create an empty multi-index dataframe:

ind = pd.MultiIndex.from_product([['A','B','C'], ['a', 'b','c']]) #set up index 
df = pd.DataFrame(columns=['col1'], index=ind)                    #create empty df with multi-level nested index
print(df)
    col1
A a  NaN
  b  NaN
  c  NaN
B a  NaN
  b  NaN
  c  NaN
C a  NaN
  b  NaN
  c  NaN

Inserting a new column works fine:

newcol = 'col2'      #new column name
df[newcol] = np.nan  #fill new column with nans
print(df)
    col1  col2
A a  NaN   NaN
  b  NaN   NaN
  c  NaN   NaN
B a  NaN   NaN
  b  NaN   NaN
  c  NaN   NaN
C a  NaN   NaN
  b  NaN   NaN
  c  NaN   NaN

Inserting data into an existing sub-row works with point data but not with a list:

df[newcol]['A','a'] = 1        #works with point data but not with list
print(df)
    col1  col2
A a  NaN   1.0
  b  NaN   NaN
  c  NaN   NaN
B a  NaN   NaN
  b  NaN   NaN
  c  NaN   NaN
C a  NaN   NaN
  b  NaN   NaN
  c  NaN   NaN

Inserting into new sub-row looks OK when viewing just the one column:

df[newcol]['A','d'] = [1,2,3]  #insert into new sub-row 'd'
print(df[newcol])              #view just new column
A  a            1
   b          NaN
   c          NaN
B  a          NaN
   b          NaN
   c          NaN
C  a          NaN
   b          NaN
   c          NaN
A  d    [1, 2, 3]
Name: col2, dtype: object

But it's not visible when viewing the entire dataframe - why?

print(df) 
    col1  col2
A a  NaN   1.0
  b  NaN   NaN
  c  NaN   NaN
B a  NaN   NaN
  b  NaN   NaN
  c  NaN   NaN
C a  NaN   NaN
  b  NaN   NaN
  c  NaN   NaN

Also, when I try different methods of inserting the data, I run into issues: Using df.loc[] works perfectly for a single data point, but not for lists:

df.loc[('A','f'),  newcol] = 1          #create new row at [(row,sub-row),column] & insert point data
print(df)                               #works fine
    col1  col2
A a  NaN   1.0
  b  NaN   NaN
  c  NaN   NaN
B a  NaN   NaN
  b  NaN   NaN
  c  NaN   NaN
C a  NaN   NaN
  b  NaN   NaN
  c  NaN   NaN
A f  NaN   1.0

Same method but inserting a list returns an error:

df.loc[('A','f'),  newcol] = [1,2,3]    #create new row at [(row,sub-row),column] & insert list data

TypeError: object of type 'numpy.float64' has no len()

Using df.at[] returns error with both point and list data:

data.at[('A','f'), newcol] = [1,2,3] #insert into existing sub-row 'f'

KeyError: ('A', 'f')


Solution

  • when you do df[newcol]['A','d'] = [1,2,3], it is chained-indexing assignment, so the result is unpredictable. Pandas doesn't guarantee correct behaviors when you do chained-indexing. When you run that command, pandas executes with a warning. This warning even includes the link to the full explanation in case you want to know. I don't go into the detail because the link in the warning explains very well on this chained-indexing.

    On assigning list to a cell, it is always a pain. However, it is doable. I guess your issue with df.loc[('A','f'), newcol] = [1,2,3] because col2 is dtype float, so pandas doesn't consider [1,2,3] as a single object list. It considers [1,2,3] as a list of multiple numeric values, so it failed. I don't know whether it is a bug or intentional.

    To solve your issue with .loc, convert col2 to dtype object and do assignment

    df['col2'] = df['col2'].astype('O')
    df.loc[('A','f'),  'col2'] = [1,2,3]
    
    print(df)
    
    Out[1911]:
        col1       col2
    A a  NaN        NaN
      b  NaN        NaN
      c  NaN        NaN
    B a  NaN        NaN
      b  NaN        NaN
      c  NaN        NaN
    C a  NaN        NaN
      b  NaN        NaN
      c  NaN        NaN
    A f  NaN  [1, 2, 3]
    
    print(df['col2'])
    
    Out[1912]:
    A  a          NaN
       b          NaN
       c          NaN
    B  a          NaN
       b          NaN
       c          NaN
    C  a          NaN
       b          NaN
       c          NaN
    A  f    [1, 2, 3]
    Name: col2, dtype: object