Search code examples
pythonpandasdataframemulti-index

How to specify the value for a multiindex in a pandas dataframe?


Suppose I want to build a pandas dataframe using multiple indexing.

I start defining the expected columns of the dataframe:

df = pd.DataFrame(columns=["val",])

Then I build some entries and their indexing:

for j in range(1,5):
  tuples = [(str(j), i) for i in range(10)]
  vals = [0,1,2,3,j,j,4,4,1,1]

At each iteration of the for loop I would like to update the dataframes with the new values. The method _append does not seem to support indexes specification and I've read that the .loc method is much more efficient.

So I was trying something like:

for i2, el in enumerate(tuples):
  df.loc[el] = vals[i2] #el is a tuple

But this is not working as I expected: If I try to execute the command with a single multi index and a single value, similar to:

df.loc[('1', 3)] = 4

I get a dataframe that looks like:

     val    3
1  NaN  4.0

whereas I was expecting something like:

      val
1  3  4.0

How to specify the value for a multiindex in a pandas dataframe?


Solution

  • The parentheses in df.loc[('1', 3)] don't make it a MultiIndex. In fact it's equivalent to df.loc['1', 3], meaning row '1', column 3.

    You would need to use:

    df.loc[('1', 3), 'val'] = 4
    

    But the Index cannot be altered dynamically.

    You must define the MultiIndex from the beginning:

    df = pd.DataFrame(columns=["val",],
                      index=pd.MultiIndex(levels=[[], []], codes=[[], []]))
    df.loc[('1', 3), 'val'] = 4
    

    Output:

        val
    1 3   4