Search code examples
pythonpandasdataframeslicemulti-index

Slice multi-index pandas dataframe by date


Say I have the following multi-index dataframe:

arrays = [np.array(['bar', 'bar', 'bar', 'bar', 'foo', 'foo', 'foo', 'foo']),
          pd.to_datetime(['2020-01-01', '2020-01-02', '2020-01-03', '2020-01-04', '2020-01-01', '2020-01-02', '2020-01-03', '2020-01-04'])]
df = pd.DataFrame(np.zeros((8, 4)), index=arrays)

                 0    1    2    3
bar 2020-01-01  0.0  0.0  0.0  0.0
    2020-01-02  0.0  0.0  0.0  0.0
    2020-01-03  0.0  0.0  0.0  0.0
    2020-01-04  0.0  0.0  0.0  0.0
foo 2020-01-01  0.0  0.0  0.0  0.0
    2020-01-02  0.0  0.0  0.0  0.0
    2020-01-03  0.0  0.0  0.0  0.0
    2020-01-04  0.0  0.0  0.0  0.0

How do I select only the part of this dataframe where the first index level = 'bar', and date > 2020.01.02, such that I can add 1 to this part?

To be clearer, the expected output would be:

                 0    1    2    3
bar 2020-01-01  0.0  0.0  0.0  0.0
    2020-01-02  0.0  0.0  0.0  0.0
    2020-01-03  1.0  1.0  1.0  1.0
    2020-01-04  1.0  1.0  1.0  1.0
foo 2020-01-01  0.0  0.0  0.0  0.0
    2020-01-02  0.0  0.0  0.0  0.0
    2020-01-03  0.0  0.0  0.0  0.0
    2020-01-04  0.0  0.0  0.0  0.0

I managed slicing it according to the first index:

df.loc['bar']

But then I am not able to apply the condition on the date.


Solution

  • Here is possible compare each level and then set 1, there is : for all columns in DataFrame.loc:

    m1 = df.index.get_level_values(0) =='bar' 
    m2 = df.index.get_level_values(1) > '2020-01-02'
    
    df.loc[m1 & m2, :] = 1
    print (df)
    
                      0    1    2    3
    bar 2020-01-01  0.0  0.0  0.0  0.0
        2020-01-02  0.0  0.0  0.0  0.0
        2020-01-03  1.0  1.0  1.0  1.0
        2020-01-04  1.0  1.0  1.0  1.0
    foo 2020-01-01  0.0  0.0  0.0  0.0
        2020-01-02  0.0  0.0  0.0  0.0
        2020-01-03  0.0  0.0  0.0  0.0
        2020-01-04  0.0  0.0  0.0  0.0