Search code examples
pythonpandasdataframemulti-index

Iterate by level in Pandas multi-index


Essentially, I have a multi-index that is first date-time, then integer value of the week number.

I want to select a week number, check some conditions of the data for that week, and if those conditions are satisfied, get the date-time level of the index.

Here's a simplified version:

# Dataframe creation
index = pd.date_range('1/1/2019', '1/4/2019')
df = pd.DataFrame({'A': [1, 2, 3, 4], 'B':[3, 4, 5, 6], 'C':['a', 'a', 'b', 'b']}, index=index)

print(df)

            A  B  C
2019-01-01  1  3  a
2019-01-02  2  4  a
2019-01-03  3  5  b
2019-01-04  4  6  b

Then,

# make a multi-index with last column
df = df.set_index([df.index, 'C'])
print(df)

              A  B
           C      
2019-01-01 a  1  3
2019-01-02 a  2  4
2019-01-03 b  3  5
2019-01-04 b  4  6

Again, fine. According to this answer, I would expect that df.loc[['a']] would yield

              A  B
           C      
2019-01-01 a  1  3
2019-01-02 a  2  4

However, I get a key/index error. I instead tried df[df['C'] == 'a'], but also with a resulting key/index error.

So the psuedo-code for what I want to achieve is something like:

df = df.set_index([df.index, 'C'])

for value in 'C' level:
    check some condition on 'B' variable.
    if condition:
       get date level for this value

This seems like it would be fairly simple if I could figure out how to slice by specified index level... What am I missing?

Note:

I realize that I could just leave the C variable as a column and then do something like:

for c in df.C.unique():
    if (df[df.C == c].B >= 4).any():
        dates = df[df.C == c].index
        print(dates)

However, now I'm fixated on understanding multi-indexes just to learn Pandas better.


Solution

  • Quick fix: query works with level names:

    df.query('C=="a"')
    

    Output:

                  A  B
               C      
    2019-01-01 a  1  3
    2019-01-02 a  2  4
    

    A bit more details on multilevel indexing from the official doc. If you have multilevel index, you slice it by tuples (first_level, second_level):

    df.loc[('2019/01/01', 'a')]
    

    So you can get all the rows with a second-level value:

    df.loc[(slice(None), 'a'), :]
    

    and you get:

                  A  B
               C      
    2019-01-01 a  1  3
    2019-01-02 a  2  4
    

    or you can use xs:

    df.xs('a', level='C')
    

    which give (note that the second level index is gone):

                A  B
    2019-01-01  1  3
    2019-01-02  2  4
    

    Finally, what you are trying to do for c in df.C.unique(): is more of groupby:

    df.loc[df.groupby('C')['B'].transform('max')>=4].index
    

    Output:

    MultiIndex([('2019-01-01', 'a'),
                ('2019-01-02', 'a'),
                ('2019-01-03', 'b'),
                ('2019-01-04', 'b')],
               names=[None, 'C'])