Search code examples
pandasiterationmultiple-columnsnumpy-ndarraycalculated-columns

Adding a column in python based on multiple iterating conditions or looping


I'd like to create an extra column in python either using Pandas or Numpy based on iterating conditions ( I think that's the way to do it). ** If any value is "False" and with the same IDxx, then the extra column is IN otherwise is OUT**


| IDxx   | Tru/Fal  |
| ------ | -------- |
| 164    | True     |
| 164    | False    |
| 164    | False    |
| 165    | True     |
| 165    | True     |
| 165    | True     |
| 166    | False    |
| 166    | True     |
| 166    | True     |
| 167    | True     |
| 167    | True     |
| 167    | False    |

I tried a few options but I'm running out of ideas. As all IDxx's are different I can't get the loop working. There are only 4 IDxx's in this example, but in my real case, there are hundreds. I'd like the output to return the following

IDxx Tru/Fal Answer
164 True IN
164 False IN
164 False IN
165 True OUT
165 True OUT
165 True OUT
166 False IN
166 True IN
166 True IN
167 True IN
167 True IN
167 False IN

Solution

  • Use groupby and replace as follows.

    idx_gb = (df.groupby('IDxx')['Tru/Fal'].min() == False).to_dict() # key: IDxx, value: IDxx includes False or not
    df['Answer'] = df['IDxx'].replace(idx_gb)