I have a panel data set that generally looks like this (it is much larger).
df['fips'] = ['01001', '01001', '01001', '01003', '01003', '01003', '01005', '01005', '01005']
df['year'] = [2001, 2002, 2003, 2001, 2002, 2003, 2001, 2002, 2003]
df['var1'] = [1, 2, 3, 4, 5, 6, 7, 8, 9]
df['var2'] = [1, 2, 3, 4, np.nan, 6, 7, 8, 9]
print(df)
Resulting dataframe:
fips year var1 var2
0 01001 2001 1 1.0
1 01001 2002 2 2.0
2 01001 2003 3 3.0
3 01003 2001 4 4.0
4 01003 2002 5 NaN
5 01003 2003 6 6.0
6 01005 2001 7 7.0
7 01005 2002 8 8.0
8 01005 2003 9 9.0
What I am hoping to do is remove the entire 'fips' panel that contains a nan value so it would look like this:
fips year var1 var2
0 01001 2001 1 1.0
1 01001 2002 2 2.0
2 01001 2003 3 3.0
3 01005 2001 7 7.0
4 01005 2002 8 8.0
5 01005 2003 9 9.0
I have tried creating a Multi-Index on 'fips' and 'year' and using dropna, but I am only able to pull out the row that actually contains the na value. Thanks in advance for any help!
You could check to make sure none of the var2
are null within the group
df.groupby('fips').filter(lambda x: ~x['var2'].isnull().any())
If you wanted to check both var cols you could try:
null_fips = df.loc[df[['var1','var2']].isnull().any(1)]['fips'].unique()
df.loc[~df['fips'].isin(null_fips)]