Search code examples
pandasmulti-index

Pandas MultiIndex slices and indexing


I'm just starting work with multiframes and I'm having a little trouble with the fairly sparse documentation and online examples on slicing and indexing.

Consider the following mutiframe

import pandas as pd
import numpy as np
levels={
'produce_source':['Vendor A', 'Vendor B'],
'day':['mon','wed','fri'],
'chiller_temp':['low','mid'],
'fruit':['apples','pears','nanas']
}

index = pd.MultiIndex.from_product(levels.values(), names = list(levels.keys()))
df = pd.DataFrame(index=index)
df = df.assign(deliveries=np.random.rand(len(df)))


                                        deliveries
produce_source day chiller_temp fruit             
Vendor A       mon low          apples    0.748376
                                pears     0.639824
                                nanas     0.604342
                   mid          apples    0.160837
                                pears     0.970412
                                nanas     0.301815
               wed low          apples    0.572627
                                pears     0.254242
                                nanas     0.590702
                   mid          apples    0.153772
                                pears     0.180117
                                nanas     0.858085
               fri low          apples    0.535358
                                pears     0.576359
                                nanas     0.893993
                   mid          apples    0.334602
                                pears     0.053892
                                nanas     0.778767
Vendor B       mon low          apples    0.565761
                                pears     0.437994
                                nanas     0.090994
                   mid          apples    0.261041
                                pears     0.028795
                                nanas     0.057612
               wed low          apples    0.808108
                                pears     0.914724
                                nanas     0.020663
                   mid          apples    0.055319
                                pears     0.888612
                                nanas     0.623370
               fri low          apples    0.419422
                                pears     0.938593
                                nanas     0.358441
                   mid          apples    0.534191
                                pears     0.590103
                                nanas     0.753034

What's the most pythonic way to achieve the following

1) View all the wed data as a slice

1a) stretch goal: don't care that 'day' is index.names[1], instead index by index name 'day'

2) Write an iterable of data only to that wed slice

3) add a chiller_temp of high for all vendors and days and fruits

I saw some slicing happening using idx = pd.IndexSlice.

idx = pd.IndexSlice
df_wip = df.loc[idx[:,'wed'], ] #1)  
#would love to write to df_wip sliced df here but get slice copy warning with df_wip['deliveries'] = list(range(0,100*len(df_wip),100)) 
df = df.loc[idx[:,'wed'],'deliveries'] = list(range(0,100*len(df_wip),100)) #2)

This raises an error AttributeError: 'list' object has no attribute 'loc'

df = df.loc[idx[:,'wed'],'deliveries'] = pd.Series(range(0,100*len(df_wip),100)) #2)

raises TypeError: unhashable type: 'slice'


Solution

  • 1) View all the wed data as a slice

    For viewing data in a multiindex, it's a lot easier to use .xs (cross section), which allows you to specify values for specific index levels instead of making you type out all of the levels like .loc w/ slice will make you do:

    df.xs('wed', level='day')
    
    Out:
                                            deliveries
    produce_source  chiller_temp    fruit   
    Vendor A        low             apples  0.521861
                                    pears   0.741856
                                    nanas   0.245843
                    mid             apples  0.471135
                                    pears   0.191322
                                    nanas   0.153920
    Vendor B        low             apples  0.711457
                                    pears   0.211794
                                    nanas   0.599071
                    mid             apples  0.303910
                                    pears   0.657348
                                    nanas   0.111750
    

    2) Write an iterable of data only to that wed slice

    If I understand this correctly, you're trying to replace the values in the 'deliveries' column with a specific iterable (say a list) where the day is 'wed'. Unfortunately .loc-type replacing doesn't work in this instance. As far as I know, pandas only has easy syntax for replacing the value of a single cell in this way using .at or .loc (see this SO answer). However, we can use iterrows to accomplish this:

    idx = pd.IndexSlice
    
    # If we don't change the column's type, which was float, this will error
    df['deliveries'] = df['deliveries'].astype(object)
    
    # Loop through rows, replacing single values
    # Only necessary if the new assigned value is mutable
    for index, row in df.loc[idx[:,'wed'], 'deliveries':'deliveries'].iterrows():
        df.at[index, 'deliveries'] = ["We", "changed", "this"]
    
    df.head(10)
    
    Out:
                                                deliveries
    produce_source  day  chiller_temp   fruit   
    Vendor A        mon  low            apples  0.0287606
                                        pears   0.264512
                                        nanas   0.238089
                         mid            apples  0.814985
                                        pears   0.590967
                                        nanas   0.919351
                    wed  low            apples  [We, changed, this]
                                        pears   [We, changed, this]
                                        nanas   [We, changed, this]
                         mid            apples  [We, changed, this]
    

    While the looping is required to my knowledge, using df.xs and then df.update instead of .loc is more understandable in my option. For example, the following code does the same as the .loc code above:

    df['deliveries'] = df['deliveries'].astype(object)
    
    # Create a temporary copy of our cross section
    df2 = df.xs('wed', level='day', drop_level=False)
    
    # The same loop as before
    for index, row in df2.iterrows():
        df2.at[index, 'deliveries'] = ["We", "changed", "this"]
    
    # Update the original df for the values we want from df2
    df.update(df2, join="left", overwrite=True, filter_func=None, raise_conflict=False)
    

    3) add a chiller_temp of high for all vendors and days and fruits

    Replacing values in existing levels of a multiindex requires replacing the whole level. This can either be accomplished with df.index.set_levels (easier way IMO) or pd.MultiIndex.from_arrays. Depending on the exact use case map and/or replace might be of use. Check out this SO answer for some other examples.

    df.index = df.index.set_levels(['high' for v in df.index.get_level_values('chiller_temp')], level='chiller_temp')
    

    4) I saw some slicing happening using idx = pd.IndexSlice...This raises an error AttributeError: 'list' object has no attribute 'loc'...raises TypeError: unhashable type: 'slice'

    For the AttributeError: 'list' object has no attribute 'loc' and TypeError: unhashable type: 'slice' errors you just have two assignments in those lines.

    It appears like your .loc syntax is correct except that you can't assign pd.Series this way without causing the cell value to be NaN (see answer to 2) for correct syntax). This works:

    idx = pd.IndexSlice
    df.loc[idx[:,'wed'], 'deliveries':'deliveries'] = "We changed this"