Search code examples
pythonmulti-indexpanel-data

For each unique index value in MultiIndex level 0, print index if values (strings) in another column are not unique


I'm working with panel data looking like this (only relevant columns included):

Ticker Year Account_number Industry
AAA    2018  xxxx           Fossil
       2019  xxxx           Fossil
       2020  xxxx           Fossil
BBB    2018  yyyy           Materials
       2019  yyyy           Services
       2020  yyyy           Materials
CCC    2018  zzzz           Services
       2019  zzzz           Services
       2020  zzzz           Services

Tickers (level 0 of MultiIndex) are used to identify individual and unique units in the panel. Each unit is observed over 3 years (level 1 of MultiIndex).

When I groupby('Industry') I end up double-counting the units since the same ticker is associated with more than one industry (as with ticker 'BBB').

The goal is to identify and print the tickers having this issue, and to assign them to a single industry.

I'm thinking of some code that returns the ticker if the string in the industry column is not unique, so that I can manually change it later.

Thanks for your help!

PS This is my first question here so pls let me know if you want me to be more specific or show more details about the df


Solution

  • If all of the values for Industry should be the same for each Ticker then you should do this the other way round.

    Instead of using groupby() on Industry, use groupby() on Ticker and then loop through the data frames and return only those for which grouped_df.Ticker.nunique() > 1