Search code examples
pythonpandasdataframeboolean-expression

Count conditions within each row in Pandas Dataframe


I have this dataframe:

   Char1  Char2  Char3
0      2      2      3
1      2      3      3
2      2      3      3
3      2      2      2

I need to create three columns (_1, _2, _3), each of which counts the occurrence of the values 1, 2 and 3 respectively in each row.

So the output would look like this:

   Char1  Char2  Char3  _1  _2  _3
0      2      2      3   0   2   1
1      2      3      3   0   1   2
2      2      3      3   0   1   2
3      2      2      2   0   3   0

For example:

  • the column _1 is always equal to 0 because there are no 1's in any row
  • for the column _2:
    • it is equal to 2 in the first row because there are two 2's in the first row
    • it is equal to 1 in the second row because there is one 2 in the second row
    • and so on .....

How do I get to code that in Python?

I have tried this code (for _1 only):

df['_1'] = df[df.Char1 == 1].sum() + df[df.Char2 == 1].sum() + df[df.Char3 == 1].sum()

but I get "NaN".

DF constructor:

df = pd.DataFrame({'Char1':[2,2,2,2], 'Char2':[2,3,3,2], 'Char3':[3,3,3,2]})

Solution

  • You can iterate over [1,2,3] and use eq on axis to identify the cells with a number and sum across columns to get the total:

    for num in [1,2,3]:
        df[f"_{num}"] = df[['Char1','Char2','Char3']].eq(num, axis=1).sum(axis=1)
    

    Output:

       Char1  Char2  Char3  _1  _2  _3
    0      2      2      3   0   2   1
    1      2      3      3   0   1   2
    2      2      3      3   0   1   2
    3      2      2      2   0   3   0