Search code examples
pythonpandasdataframedata-analysis

Count non-empty cells in pandas dataframe rows and add counts as a column


Using Python, I want to count the number of cells in a row that has data in it, in a pandas data frame and record the count in the leftmost cell of the row.

Frame showing count column on left as requested


Solution

  • To count the number of cells missing data in each row, you probably want to do something like this:

    df.apply(lambda x: x.isnull().sum(), axis='columns')
    

    Replace df with the label of your data frame.

    You can create a new column and write the count to it using something like:

    df['MISSING'] = df.apply(lambda x: x.isnull().sum(), axis='columns')
    

    The column will be created at the end (rightmost) of your data frame.

    You can move your columns around like this:

    df = df[['Count', 'M', 'A', 'B', 'C']]
    

    Update

    I'm wondering if your missing cells are actually empty strings as opposed to NaN values. Can you confirm? I copied your screenshot into an Excel workbook. My full code is below:

    df = pd.read_excel('count.xlsx', na_values=['', ' '])
    df.head() # You should see NaN for empty cells
    df['M']=df.apply(lambda x: x.isnull().sum(), axis='columns')
    df.head() # Column M should report the values: first row: 0, second row: 1, third row: 2
    df = df[['Count', 'M', 'A', 'B', 'C']]
    df.head() # Column order should be Count, M, A, B, C
    

    Notice the na_values parameter in the pd.read_excel method.