Search code examples
pythonpandasdataframemulti-index

Pandas: how to add data to a MultiIndex empty DataFrame?


I would like to use a MultiIndex DataFrame to easily select portions of the DataFrame. I created an empty DataFrame as follows:

mi = mindex = {'input':['a','b','c'],'optim':['pareto','alive']}
mi = pd.MultiIndex.from_tuples([(c,k) for c in mi.keys() for k in mi[c]])
mc = pd.MultiIndex(names=['Generation','Individual'],labels=[[],[]],levels=[[],[]])
population = pd.DataFrame(index=mi,columns=mc)

which seems to be good. However, I do not know how to insert a single data to start populating my DataFrame. I tried the followings:

population.loc[('optim','pareto'),(0,0)]=True

where I tried to define a new column double index (0,0) leading to a NotImplementedError. I also tried with (0,1), which gave a ValueError.

I tried also with no columns indexes:

population.loc[('optim','pareto')]=True

Which gave no error...but no change in the DataFrame either... Any help? Thanks in advance.

EDIT To clarify my question, once populated, my DataFrame should look like this:

Generation     1               2
Individual     1    2    3     4    5     6
input       a  1    1    2     ...
            b  1    2    2     ...
            c  1    1    2     ...
optim  pareto  True True False ...
        alive  True True False ...

EDIT 2 I found out that what I was doing works if I define my first column at the DataFrame creation. In particular with:

mc = pd.MultiIndex.from_tuples([(0,0)])

I get a first column full of nan and I can add data as I wanted to (also for new columns):

population.loc[('optim','pareto'),(0,1)]=True

I still do not know what is wrong with my first definition...


Solution

  • Even if I do not know why my initial definition was wrong, the following works as expected:

    mi = {'input':['a','b','c'],'optim':['pareto','alive']}
    mi = pd.MultiIndex.from_tuples([(c,k) for c in mi.keys() for k in mi[c]])
    mc = pd.MultiIndex.from_tuples([(0,0)],names=['Generation','Individual'])
    population = pd.DataFrame(index=mi,columns=mc)
    

    It looks like the solution was to initialize the columns at the DataFrame creation (here with a (0,0) column). The created DataFrame is then:

    Generation      0
    Individual      0
    input a       NaN
          b       NaN
          c       NaN
    optim pareto  NaN
          alive   NaN
    

    which can be then be populated adding values to the current column or new columns/rows.