Search code examples
pythonpandaspandas-resample

Pandas - MultiIndex resample - I don´t want to lose informations from other indexes


I´ve got the following DataFrame:

                        value
A   B
111 2024-03-22 00:00:00 1
111 2024-03-22 01:00:00 2
111 2024-03-22 02:00:00 3
222 2024-03-22 00:00:00 4
222 2024-03-22 01:00:00 5
222 2024-03-22 02:00:00 6

Now I want to resample and sum index B to days and would expect the following result:

                        value
A   B
111 2024-03-22 00:00:00 6
222 2024-03-22 00:00:00 15

How can I achieve something like that?

Another Example would be the following:

                        value
A   B
111 2024-03-22 00:00:00 1
111 2024-03-22 01:00:00 2
111 2024-03-22 02:00:00 3
222 2024-03-22 00:00:00 4
222 2024-03-22 01:00:00 5
222 2024-03-22 02:00:00 6
333 2024-03-22 05:00:00 7

Of which I want the following result with resampling by 1h:

                        value
A   B
111 2024-03-22 00:00:00 1
111 2024-03-22 01:00:00 2
111 2024-03-22 02:00:00 3
111 2024-03-22 03:00:00 0
111 2024-03-22 04:00:00 0
111 2024-03-22 05:00:00 0
222 2024-03-22 00:00:00 4
222 2024-03-22 01:00:00 5
222 2024-03-22 02:00:00 6
222 2024-03-22 03:00:00 0
222 2024-03-22 04:00:00 0
222 2024-03-22 05:00:00 0
333 2024-03-22 00:00:00 0
333 2024-03-22 01:00:00 0
333 2024-03-22 02:00:00 0
333 2024-03-22 03:00:00 0
333 2024-03-22 04:00:00 0
333 2024-03-22 05:00:00 7

Pandas Version: 2.0.1

I tried using level on resample but that way I lose Index A.

I have the same issue when I have two timestamps in the index and want one to be resampled to days and the other to hours.

I´ve looked at other answers of related questions here but couldn´t find a way to get them working for me.

I´m confusing myself already and it could be that I just can´t see the correct way to achieve this. :)

If you need anymore information please let me know.

Thanks in advance.

Regards Oliver


Solution

  • You need to groupby before you resample to preserve the A index.

    import pandas as pd
    
    df = pd.DataFrame.from_dict({'value': 
     {(111, pd.Timestamp('2024-03-22 00:00:00')): 1,
      (111, pd.Timestamp('2024-03-22 01:00:00')): 2,
      (111, pd.Timestamp('2024-03-22 02:00:00')): 3,
      (222, pd.Timestamp('2024-03-22 00:00:00')): 4,
      (222, pd.Timestamp('2024-03-22 01:00:00')): 5,
      (222, pd.Timestamp('2024-03-22 02:00:00')): 6}}
    )
    
    df.groupby(level=0).resample('d', level=1).sum()
    # returns:
                    value
    A   B
    111 2024-03-22      6
    222 2024-03-22     15