I'm trying to drop rows from dataframe if they 'partially' meet certain condition.
By 'partially' I mean some (not all) values in the cell meet the condition.
Lets' say that I have this dataframe.
>>> df
Title Body
0 Monday report: Stock market You should consider buying this.
1 Tuesday report: Equity XX happened.
2 Corrections and clarifications I'm sorry.
3 Today's top news Yes, it skyrocketed as I predicted.
I want to remove the entire row if the Title has "Monday report:" or "Tuesday report:".
One thing to note is that I used
TITLE = []
.... several lines of codes to crawl the titles.
TITLE.append(headline)
to crawl and store them into dataframe.
Another thing is that my data are in tuples because I used
df = pd.DataFrame(list(zip(TITLE, BODY)), columns =['Title', 'Body'])
to make the dataframe.
I think that's why when I used,
df.query("'Title'.str.contains('Monday report:')")
I got an error.
When I did some googling here in StackOverflow, some advised to convert tuples into multi-index and to use filter()
, drop()
, or isin()
.
None of them worked.
Or maybe I used them in a wrong way...?
Any idea to solve this prob?
you can do a basic filter for a condition and then pick reverse of it using ~
:
eg:
df[~df['Title'].str.contains('Monday report')]
will give you output that excludes all rows that contain 'Monday report' in title.