Search code examples
pythonpandasnanmulti-indexpanel-data

How to remove entire panel if one value in panel is missing using python pandas?


I have a panel data set that generally looks like this (it is much larger).

df['fips'] = ['01001', '01001', '01001', '01003', '01003', '01003', '01005', '01005', '01005']

df['year'] = [2001, 2002, 2003, 2001, 2002, 2003, 2001, 2002, 2003]

df['var1'] = [1, 2, 3, 4, 5, 6, 7, 8, 9]

df['var2'] = [1, 2, 3, 4, np.nan, 6, 7, 8, 9]

print(df)

Resulting dataframe:

    fips  year  var1  var2
0  01001  2001     1   1.0
1  01001  2002     2   2.0
2  01001  2003     3   3.0
3  01003  2001     4   4.0
4  01003  2002     5   NaN
5  01003  2003     6   6.0
6  01005  2001     7   7.0
7  01005  2002     8   8.0
8  01005  2003     9   9.0

What I am hoping to do is remove the entire 'fips' panel that contains a nan value so it would look like this:

    fips  year  var1  var2
0  01001  2001     1   1.0
1  01001  2002     2   2.0
2  01001  2003     3   3.0
3  01005  2001     7   7.0
4  01005  2002     8   8.0
5  01005  2003     9   9.0

I have tried creating a Multi-Index on 'fips' and 'year' and using dropna, but I am only able to pull out the row that actually contains the na value. Thanks in advance for any help!


Solution

  • You could check to make sure none of the var2 are null within the group

    df.groupby('fips').filter(lambda x: ~x['var2'].isnull().any())
    

    If you wanted to check both var cols you could try:

    null_fips = df.loc[df[['var1','var2']].isnull().any(1)]['fips'].unique()
    df.loc[~df['fips'].isin(null_fips)]