Search code examples
pythonpandasdataframemulti-index

Create an hash using multi-index values in dataframe


Let's suppose we have the following dataframe:

>> df = pd.DataFrame(
columns = ['Film', 'Rating', 'Name'],
 data=[['Alien', 9, 'Jane'], ['Alien', 7, 'Mark'],
 ['LOTR', 8, 'Jack'], ['LOTR', 6, 'John']])

>> df.set_index(['Film', 'Rating'])
              Name
Film  Rating      
Alien 9       Jane
      7       Mark
LOTR  8       Jack
      6       John

I want to create an hash column using multi-index values, something like this:

              Name     hash
Film  Rating               
Alien 9       Jane  Alien/9
      7       Mark  Alien/7
LOTR  8       Jack   LOTR/8
      6       John   LOTR/6

Tried this:

df['hash'] = df.apply(lambda x: '/'.join(x.index), axis = 1)

but x.index is referring to the dataframe index, not the row one, so I'm actually stuck. Moreover, I'm not sure if it's possible to concatenate int and str values.

EDIT: To be more clear, I need something that works even if I have a longer multi-index (3+ columns) and with int type inside.


Solution

  • Try:

    df["hash"] = (
        df.index.get_level_values(level=0)
        + "/"
        + df.index.get_level_values(level=1)
    )
    print(df)
    

    Prints:

                  Name     hash
    Film  Rating               
    Alien 9       Jane  Alien/9
          7       Mark  Alien/7
    LOTR  8       Jack   LOTR/8
          6       John   LOTR/6
    

    Or:

    df["hash"] = df.index.map("/".join)
    print(df)