Search code examples
pythonfilterdrop

Dropping index based on its length


I have a panel time series data of the following type with multiindex, country ID and year:

arrays = [['country i', 'country i', 'country i', 'country j', 'country j', 'country j', 'country e'], 
[1999,2000,2001,1999,2000,2001,2000]]

tuples = list(zip(*arrays))

index = pd.MultiIndex.from_tuples(tuples, names=["country ID", "year"])

dfx = pd.Series(np.random.randn(7), index=index)

print(dfx)

country ID  year
country i   1999    0.572030
            2000    1.736893
            2001   -1.213016
country j   1999    0.167581
            2000   -1.178015
            2001   -1.470233
country e   2000    1.298953
dtype: float64

And I want to drop, for example, all those country IDs that has less than 2 observations. How can I filter the dataframe so that there are no country ID with observations less than 2. In the above example, country e should be dropped from the dataset.

Thank you beforehand!


Solution

  • One approach is:

    mask = dfx.groupby(level=0).transform("count") >= 2
    print(dfx[mask])
    

    Output

    country ID  year
    country i   1999   -1.259176
                2000    0.123215
                1999    0.899501
    country j   2000   -0.111309
                1999    2.260785
                2000   -0.460683
    dtype: float64