I'm trying to code a line in which I drop a row in a dataframe if a pvalue (columns) is lower than 1.3 for 3 out of 5 columns. If the pvalue is greater than 1.3 in 3 out of 5 columns i keep the row. The code looks like this:
for i in np.arange(pvalue.shape[0]):
if (pvalue.iloc[i,1:] < 1.3).count() > 2:
pvalue.drop(index = pvalue.index[i], axis = 0, inplace = True)
else:
None
the pvalue dataframe has 6 columns, first column is a string and the next 5 are pvalues of an experiment. I get this error:
IndexError: single positional indexer is out-of-bounds
and I don't how to fix this. I appreciate every help. BTW I'm a complete python beginner, so be patient with me! :) Thanks and looking forward to your solutions!
I am not very knowledgeable with Pandas so there probably is a better way to go about it but this should work:
By using iterrows(), you can iterate over each row of a DataFrame.
for idx, row in pvalue.iterrows():
In the loop you will have access to the idx
variable which is the index of the row you're currently iterating on, and the row values itself in the row
variable.
Then for every row, you can iterate through each column value with a simple for
loop.
for val in row[1:]:
while making sure you start with the 2nd value (or in other words, by ignoring the index 0
and starting with index 1
).
The rest is pretty straightforward.
threshold = 1.3
for idx, row in pvalue.iterrows():
count = 0
for val in row[1:]:
if val < threshold:
count += 1
if count > 2:
pvalue.drop(idx, inplace=True)