I am working with a df large panel data of financial related values, where I have many companies (and their fundamental values) across many years. The df looks something like this:
year ticker return_y
0 1985 VLID -0.5838
1 1985 KO 0.3245
2 1994 CTL -0.3063
3 1996 DRYR -0.1607
.. ... ... ...
1356 2002 CHUX -0.2456
1357 1987 HRL -0.0233
1358 2015 KO 0.2343
.. ... ... ...
56798 2017 AFMXF 0.0558
56799 2014 TER 0.0134
I know that some firms have only one or two years reported, and I am afraid that they will create some biases in my analysis. Therefore, I would only like to keep only those firms which have at least 3 years - can anyone help me find a way to do that?
Thank you in advance!
You can take care of this in a single line: invoke the groupby.filter()
with lambda:
df.groupby(df.ticker).filter(lambda x: len(x) > 2)