Search code examples
excelpandasdataframerows

Locating row by value in multiple columns


This is my DataFrame (df): enter image description here

i want to iterate over my rows, and if i see a '999' value somewhere (id excluded):

  1. I need to make sure the entire row is '999'.
  2. I need to make sure the 2nd row that has the same id value is also covered with '999's.

for example, for id=5, i have a row with 999 in it, and the second row where id=5 does not have '999's.

Expected output: enter image description here

this is what i have:

num_of_p = len(df.columns) - 1
for v in df.index:
    if (sum(df.iloc[v] == 999) != num_of_p):
        if (sum(df.iloc[v] == 999) != 0):
          raise Exception("***** value 999 should apply to the entire row -please check and re-run*****")

and this code works for my first condition. I'm having trouble figuring the second one. Any help will be greatly appreciated!


Solution

  • You can use boolean indexing:

    m = df.loc[:, "p1":].apply(lambda x: 999 in x.values, axis=1)
    df.loc[df["id"].isin(df.loc[m, "id"]), "p1":] = 999
    print(df)
    

    Prints:

       id   p1   p2   p3   p4
    0   2    0    0    0    0
    1   2    1    1    1    1
    2   4  999  999  999  999
    3   4  999  999  999  999
    4   5  999  999  999  999
    5   5  999  999  999  999
    6   9    1    1    1    1
    7   9    0    0    0    0
    

    EDIT: To get single "rows" that contains 999 (assuming there are always twins):

    m = df.loc[:, "p1":].apply(lambda x: 999 in x.values, axis=1)
    
    x = df.loc[m, "id"].value_counts()
    print('Rows that contain 999 and are "single":')
    print(x[x == 1].index.values)
    

    Prints:

    Rows that contain 999 and are "single":
    [5]