Search code examples
pythonpandasdataframeindexingmulti-index

Pandas multiindex: get level values without duplicates


So I'm sure this is pretty trivial but I'm pretty new to python/pandas.

I want to get a certain column (Names of my measurements) of my Multiindex as a list to use it in a for loop later to name and save my plots. I'm pretty confident in getting the data I need from my dataframe but i can't figure out how to get certain columns from my index.

So actually while writing the question I kind of figured the answer out but it still seems kind of clunky. There has to be a direct command to do this. That would be my code:

a = df.index.get_level_values('File')
a = a.drop_duplicates()
a = a.values

Solution

  • index.levels

    You can access unique elements of each level of your MultiIndex directly:

    df = pd.DataFrame([['A', 'W', 1], ['B', 'X', 2], ['C', 'Y', 3],
                       ['D', 'X', 4], ['E', 'Y', 5]])
    df = df.set_index([0, 1])
    
    a = df.index.levels[1]
    
    print(a)
    Index(['W', 'X', 'Y'], dtype='object', name=1)
    

    To understand the information available, see how the Index object is stored internally:

    print(df.index)
    
    MultiIndex(levels=[['A', 'B', 'C', 'D', 'E'], ['W', 'X', 'Y']],
               labels=[[0, 1, 2, 3, 4], [0, 1, 2, 1, 2]],
               names=[0, 1])
    

    However, the below methods are more intuitive and better documented.

    One point worth noting is you don't have to explicitly extract the NumPy array via the values attribute. You can iterate Index objects directly. In addition, method chaining is possible and encouraged with Pandas.

    drop_duplicates / unique

    Returns an Index object, with order preserved.

    a = df.index.get_level_values(1).drop_duplicates()
    # equivalently, df.index.get_level_values(1).unique()
    
    print(a)
    Index(['W', 'X', 'Y'], dtype='object', name=1)
    

    set

    Returns a set. Useful for O(1) lookup, but result is unordered.

    a = set(df.index.get_level_values(1))
    
    print(a)
    {'X', 'Y', 'W'}