I have a dataframe with about 8000 records that I would like to go through and drop all rows where columns 1/1/2015 - 1/1/2030 are equal to 0 (columns are labelled as 1/1/2015, 1/1/2016, etc.). Here is my code:
find_zeros = 0
for index, row in df.iterrows():
for z in range(2015,2031):
find_zeros += row['1/1/%s' %(z)]
if (find_zeros <= 0):
df.drop(index, inplace=True)
I keep running into many issues so I want to know if you know how to make this code work or if there is a better way to do this.
For reference, here is an example of the df:
name 1/1/2015 1/1/2016 1/1/2017
a 0.0 0.0 0.0
b 0.0 0.0 1.0
c 0.0 0.0 0.0
d 1.0 0.0 0.0
e 0.0 0.0 1.0
f 0.0 0.0 0.0
Expected outcome:
name 1/1/2015 1/1/2016 1/1/2017
b 0.0 0.0 1.0
d 1.0 0.0 0.0
e 0.0 0.0 1.0
We can filter to keep values where not all the values are (eq) 0 in the subset of cols
:
cols = [f'1/1/{v}' for v in range(2015, 2018)] # change upper bound to 2031
df = df[~df[cols].eq(0).all(axis=1)]
or where any are not equal (ne) to 0 in the subset:
cols = [f'1/1/{v}' for v in range(2015, 2018)] # change upper bound to 2031
df = df[df[cols].ne(0).any(axis=1)]
*Note if creating a new DataFrame from this selection:
new_df = df[df[cols].ne(0).any(axis=1)]
copy
or reset_index
or loc
will be needed to avoid a later SettingWithCopyWarning
.
For example,
new_df = df[df[cols].ne(0).any(axis=1)].copy()
Sample Data:
df = pd.DataFrame({
'name': {0: 'a', 1: 'b', 2: 'c', 3: 'd', 4: 'e', 5: 'f'},
'1/1/2015': {0: 0.0, 1: 0.0, 2: 0.0, 3: 1.0, 4: 0.0, 5: 0.0},
'1/1/2016': {0: 0.0, 1: 0.0, 2: 0.0, 3: 0.0, 4: 0.0, 5: 0.0},
'1/1/2017': {0: 0.0, 1: 1.0, 2: 0.0, 3: 0.0, 4: 1.0, 5: 0.0}
})
name 1/1/2015 1/1/2016 1/1/2017
0 a 0.0 0.0 0.0
1 b 0.0 0.0 1.0
2 c 0.0 0.0 0.0
3 d 1.0 0.0 0.0
4 e 0.0 0.0 1.0
5 f 0.0 0.0 0.0
Sample Output:
name 1/1/2015 1/1/2016 1/1/2017
1 b 0.0 0.0 1.0
3 d 1.0 0.0 0.0
4 e 0.0 0.0 1.0