I am trying to insert a list of data into a multi-level pandas dataframe. It seems to work just fine, but when I view the entire dataframe, the new sub-row is not there. Here is an example:
Create an empty multi-index dataframe:
ind = pd.MultiIndex.from_product([['A','B','C'], ['a', 'b','c']]) #set up index
df = pd.DataFrame(columns=['col1'], index=ind) #create empty df with multi-level nested index
print(df)
col1
A a NaN
b NaN
c NaN
B a NaN
b NaN
c NaN
C a NaN
b NaN
c NaN
Inserting a new column works fine:
newcol = 'col2' #new column name
df[newcol] = np.nan #fill new column with nans
print(df)
col1 col2
A a NaN NaN
b NaN NaN
c NaN NaN
B a NaN NaN
b NaN NaN
c NaN NaN
C a NaN NaN
b NaN NaN
c NaN NaN
Inserting data into an existing sub-row works with point data but not with a list:
df[newcol]['A','a'] = 1 #works with point data but not with list
print(df)
col1 col2
A a NaN 1.0
b NaN NaN
c NaN NaN
B a NaN NaN
b NaN NaN
c NaN NaN
C a NaN NaN
b NaN NaN
c NaN NaN
Inserting into new sub-row looks OK when viewing just the one column:
df[newcol]['A','d'] = [1,2,3] #insert into new sub-row 'd'
print(df[newcol]) #view just new column
A a 1
b NaN
c NaN
B a NaN
b NaN
c NaN
C a NaN
b NaN
c NaN
A d [1, 2, 3]
Name: col2, dtype: object
But it's not visible when viewing the entire dataframe - why?
print(df)
col1 col2
A a NaN 1.0
b NaN NaN
c NaN NaN
B a NaN NaN
b NaN NaN
c NaN NaN
C a NaN NaN
b NaN NaN
c NaN NaN
Also, when I try different methods of inserting the data, I run into issues: Using df.loc[] works perfectly for a single data point, but not for lists:
df.loc[('A','f'), newcol] = 1 #create new row at [(row,sub-row),column] & insert point data
print(df) #works fine
col1 col2
A a NaN 1.0
b NaN NaN
c NaN NaN
B a NaN NaN
b NaN NaN
c NaN NaN
C a NaN NaN
b NaN NaN
c NaN NaN
A f NaN 1.0
Same method but inserting a list returns an error:
df.loc[('A','f'), newcol] = [1,2,3] #create new row at [(row,sub-row),column] & insert list data
TypeError: object of type 'numpy.float64' has no len()
Using df.at[] returns error with both point and list data:
data.at[('A','f'), newcol] = [1,2,3] #insert into existing sub-row 'f'
KeyError: ('A', 'f')
when you do df[newcol]['A','d'] = [1,2,3]
, it is chained-indexing assignment, so the result is unpredictable. Pandas doesn't guarantee correct behaviors when you do chained-indexing. When you run that command, pandas executes with a warning. This warning even includes the link to the full explanation in case you want to know. I don't go into the detail because the link in the warning explains very well on this chained-indexing.
On assigning list to a cell, it is always a pain. However, it is doable. I guess your issue with df.loc[('A','f'), newcol] = [1,2,3]
because col2
is dtype float
, so pandas doesn't consider [1,2,3]
as a single object list
. It considers [1,2,3]
as a list of multiple numeric values, so it failed. I don't know whether it is a bug or intentional.
To solve your issue with .loc
, convert col2
to dtype object
and do assignment
df['col2'] = df['col2'].astype('O')
df.loc[('A','f'), 'col2'] = [1,2,3]
print(df)
Out[1911]:
col1 col2
A a NaN NaN
b NaN NaN
c NaN NaN
B a NaN NaN
b NaN NaN
c NaN NaN
C a NaN NaN
b NaN NaN
c NaN NaN
A f NaN [1, 2, 3]
print(df['col2'])
Out[1912]:
A a NaN
b NaN
c NaN
B a NaN
b NaN
c NaN
C a NaN
b NaN
c NaN
A f [1, 2, 3]
Name: col2, dtype: object