i want to iterate over my rows, and if i see a '999' value somewhere (id excluded):
for example, for id=5, i have a row with 999 in it, and the second row where id=5 does not have '999's.
this is what i have:
num_of_p = len(df.columns) - 1
for v in df.index:
if (sum(df.iloc[v] == 999) != num_of_p):
if (sum(df.iloc[v] == 999) != 0):
raise Exception("***** value 999 should apply to the entire row -please check and re-run*****")
and this code works for my first condition. I'm having trouble figuring the second one. Any help will be greatly appreciated!
You can use boolean indexing:
m = df.loc[:, "p1":].apply(lambda x: 999 in x.values, axis=1)
df.loc[df["id"].isin(df.loc[m, "id"]), "p1":] = 999
print(df)
Prints:
id p1 p2 p3 p4
0 2 0 0 0 0
1 2 1 1 1 1
2 4 999 999 999 999
3 4 999 999 999 999
4 5 999 999 999 999
5 5 999 999 999 999
6 9 1 1 1 1
7 9 0 0 0 0
EDIT: To get single "rows" that contains 999 (assuming there are always twins):
m = df.loc[:, "p1":].apply(lambda x: 999 in x.values, axis=1)
x = df.loc[m, "id"].value_counts()
print('Rows that contain 999 and are "single":')
print(x[x == 1].index.values)
Prints:
Rows that contain 999 and are "single":
[5]