I am new to python and using pandas.
I want to query a dataframe and filter the rows where one of the columns is not NaN
.
I have tried:
a=dictionarydf.label.isnull()
but a is populated with true
or false
.
Tried this
dictionarydf.query(dictionarydf.label.isnull())
but gave an error as I expected
sample data:
reference_word all_matching_words label review
0 account fees - account NaN N
1 account mobile - account NaN N
2 account monthly - account NaN N
3 administration delivery - administration NaN N
4 administration fund - administration NaN N
5 advisor fees - advisor NaN N
6 advisor optimum - advisor NaN N
7 advisor sub - advisor NaN N
8 aichi delivery - aichi NaN N
9 aichi pref - aichi NaN N
10 airport biz - airport travel N
11 airport cfo - airport travel N
12 airport cfomtg - airport travel N
13 airport meeting - airport travel N
14 airport summit - airport travel N
15 airport taxi - airport travel N
16 airport train - airport travel N
17 airport transfer - airport travel N
18 airport trip - airport travel N
19 ais admin - ais NaN N
20 ais alpine - ais NaN N
21 ais fund - ais NaN N
22 allegiance custody - allegiance NaN N
23 allegiance fees - allegiance NaN N
24 alpha late - alpha NaN N
25 alpha meal - alpha NaN N
26 alpha taxi - alpha NaN N
27 alpine admin - alpine NaN N
28 alpine ais - alpine NaN N
29 alpine fund - alpine NaN N
I want to filter the data where label is not NaN
expected output:
reference_word all_matching_words label review
0 airport biz - airport travel N
1 airport cfo - airport travel N
2 airport cfomtg - airport travel N
3 airport meeting - airport travel N
4 airport summit - airport travel N
5 airport taxi - airport travel N
6 airport train - airport travel N
7 airport transfer - airport travel N
8 airport trip - airport travel N
You can use dropna
:
df = df.dropna(subset=['label'])
print (df)
reference_word all_matching_words label review
10 airport biz - airport travel N
11 airport cfo - airport travel N
12 airport cfomtg - airport travel N
13 airport meeting - airport travel N
14 airport summit - airport travel N
15 airport taxi - airport travel N
16 airport train - airport travel N
17 airport transfer - airport travel N
18 airport trip - airport travel N
Another solution - boolean indexing
with notnull
:
df = df[df.label.notnull()]
print (df)
reference_word all_matching_words label review
10 airport biz - airport travel N
11 airport cfo - airport travel N
12 airport cfomtg - airport travel N
13 airport meeting - airport travel N
14 airport summit - airport travel N
15 airport taxi - airport travel N
16 airport train - airport travel N
17 airport transfer - airport travel N
18 airport trip - airport travel N