I have a df
id column_int column_int column_A column_B column_C column_D
0 1 int int ABC ABC Keep na
1 2 int int ABC ABC ABC ABC
2 3 int int ABC Save na na
3 4 int int ABC Keep na na
4 5 int imt ABC ABC ABC ABC
.
.
Where column_int
are columns that contain ints and column A-D
contain text values. I want to keep only the rows that have either Keep
or Save
as row values
Before:
id column_int column_int column_A column_B column_C column_D
0 1 int int ABC ABC Keep na
1 2 int int ABC ABC ABC ABC
2 3 int int ABC Save na na
3 4 int int ABC Keep na na
4 5 int imt ABC ABC ABC ABC
After:
id column_int column_int column_A column_B column_C column_D
0 1 int int ABC ABC Keep na
2 3 int int ABC Save na na
3 4 int int ABC Keep na na
I tried the following
for column in df:
if type(column) == object:
df = df[df[column].str.contains('Save')] | df[df[column].str.contains('Keep')]
else:
pass
You can use use .apply()
on the selected columns, then for each column check for Save
or Keep
by str.contains
. Then, use .any()
on axis=1
(for row-wise operation) to check if the row contains such strings.
Finally, filter by .loc
, as follows:
cols = ['column_A', 'column_B', 'column_C', 'column_D']
df.loc[df[cols].apply(lambda x: x.str.contains(r'Save|Keep')).any(axis=1)]
Result:
id column_int column_int.1 column_A column_B column_C column_D
0 1 int int ABC ABC Keep na
2 3 int int ABC Save na na
3 4 int int ABC Keep na na