Search code examples
pythonpandasnumpyfinance

Panel data - Keep companies which have at least 3 years of data in Pandas


I am working with a df large panel data of financial related values, where I have many companies (and their fundamental values) across many years. The df looks something like this:

        year     ticker     return_y
0       1985      VLID       -0.5838
1       1985        KO        0.3245
2       1994       CTL       -0.3063
3       1996      DRYR       -0.1607
..       ...       ...           ...
1356    2002      CHUX       -0.2456
1357    1987       HRL       -0.0233
1358    2015        KO        0.2343
..       ...       ...           ...
56798   2017      AFMXF       0.0558
56799   2014        TER       0.0134

I know that some firms have only one or two years reported, and I am afraid that they will create some biases in my analysis. Therefore, I would only like to keep only those firms which have at least 3 years - can anyone help me find a way to do that?

Thank you in advance!


Solution

  • You can take care of this in a single line: invoke the groupby.filter() with lambda:

    df.groupby(df.ticker).filter(lambda x: len(x) > 2)