Search code examples
pythonpandasmulti-index

How to access object in pandas multiindex while ignoring one level


I am trying to find a way to perform this loop. Basically, I have a bunch of data I need to access. I have an ID number and a time that the data was gathered, and for each data point I have multiple variables that were gathered (themselves stored in DataFrames/Series/Numbers as appropriate). So I create a MultiIndex for a series in which I store my data that ends up looking like this.

df =

ID No    Time    Variable
123      0.1     A         (Dataframe)
                 B         (Dataframe)
                 C         (Dataframe)
127      0.8     A         (Dataframe)
                 B         (Dataframe)
                 C         (Dataframe)
...

The catch is - sometimes it's convenient to gather data by ID number (Give me all data with IDs in this range), and other times it's easier to gather by time (Give me data between this time and this time).

It's probably important to note that my "time" values are frequently unknown ahead of time and are not 'neat' like seen here. (They may be 0.1236943 for example).

The problem I'm having is how to access my variables when I'm processing the data. For example, let's say I want to loop over all IDs in the dataset. I might do this:

for i in df.index.get_level_values("ID No").unique(): # This could be "Time" instead of ID No if that's what I wanted instead.
    thisData = df[i]

but when I do this what I'll get is:

thisData =

Time    Variable
0.1     A          (Dataframe)
        B          (Dataframe)
        C          (Dataframe)

Since I may not know the value of Time, how can I access my variables (or more specifically, their stored data)? I get a KeyError if I try something like thisData['A'].

As a related note, I will always be analyzing the data in a group of all variables. I.E - For each iteration of the loop I will be analyzing only A, B, and C for a specific ID No/Time value. Given this, is there a better way to do this loop?


Solution

  • I ended up figuring it out. The solution was a combination of changing how my loop iteration worked and understanding indexing with multiindexes a bit better, which really just involved playing around until it worked. Pandas MultiIndex docs

    I changed my loop from my question to:

    for i,frame in df.groupby("ID No"):
    

    This now gave me the part of the series relevant to a given ID Number (I could've also used time if I wanted). Then I can access a given variable ('A' in this example) using:

    frame[:,:,'A'].iloc[0]
    

    That final iloc[0] is needed because I would otherwise just get a slice of the series which contains all my data frames. I need the .iloc[0] to actually get the stored dataframe.