Search code examples
pythonpandasdataframecounttablecolumn

Pandas calculate the number of columns of a given name there are that have a value in a row


I have this dataset where I have some columns (not important to the calculations) and then many columns with same starting name. I want to calculate the sum of those columns per one row which contains else than NaN-value. The set looks something like this:

id something number1 number2 number3 number4
1 105 200 NaN NaN 50
2 300 2 1 1 33
3 20 1 NaN NaN NaN

So I want to create new column that contains the length of the number columns that have a value. So the final dataset would look like this:

id something number1 number2 number3 number4 sum_columns
1 105 200 NaN NaN 50 2
2 300 2 1 1 33 4
3 20 1 NaN NaN NaN 1

I know I can calculate the length of columns that start by specific name something like this:

df[df.columns[pd.Series(df.columns).str.startswith('number')]]

but I cant figure out, how can I add condition that there has to be other than NaN value and also how to apply it to every row. I think it could be done with lambda? but haven't succeeded yet.


Solution

  • # filter column on 'number' and count
    df['sum_columns']=df.filter(like='number').count(axis=1)
    df
    
        id  something   number1     number2     number3     number4     sum_columns
    0    1      105         200         NaN         NaN       50.0          2
    1    2      300           2         1.0         1.0       33.0          4
    2    3       20           1         NaN         NaN        NaN          1
    

    PS: Your first DF and second DF, the NaN count don't match. I used the second DF in the solution