I'm working with panel data looking like this (only relevant columns included):
Ticker Year Account_number Industry
AAA 2018 xxxx Fossil
2019 xxxx Fossil
2020 xxxx Fossil
BBB 2018 yyyy Materials
2019 yyyy Services
2020 yyyy Materials
CCC 2018 zzzz Services
2019 zzzz Services
2020 zzzz Services
Tickers (level 0 of MultiIndex) are used to identify individual and unique units in the panel. Each unit is observed over 3 years (level 1 of MultiIndex).
When I groupby('Industry')
I end up double-counting the units since the same ticker is associated with more than one industry (as with ticker 'BBB').
The goal is to identify and print the tickers having this issue, and to assign them to a single industry.
I'm thinking of some code that returns the ticker if the string in the industry column is not unique, so that I can manually change it later.
Thanks for your help!
PS This is my first question here so pls let me know if you want me to be more specific or show more details about the df
If all of the values for Industry
should be the same for each Ticker
then you should do this the other way round.
Instead of using groupby()
on Industry
, use groupby()
on Ticker
and then loop through the data frames and return only those for which grouped_df.Ticker.nunique() > 1