Search code examples
pythonpandaspivot-table

Calculate all missing values for specific data using pivot tables in pandas


I am working on this dataset called titanic.csv Let's simplify the problem and include some data here:

[![enter image description here](https://i.sstatic.net/LKGk47dr.png)](https://i.sstatic.net/LKGk47dr.png)

I need to calculate all missing values for child, as you see it is a value under who column. This should be done using a pivot table.

I have tried this solution:

pd.pivot_table(df[df['who'] == 'child'], 
index='sex', 
aggfunc=lambda x: x.isnull().sum(), 
 margins=True) # to sum all missing values based on gender

But I get this output: output when using the above code in which as you also notice, ALL row doesn't sum all missing values per gender.

Where is the problem in my code? Should I use another way to create the pivot table?


Solution

  • EDIT:

    If you prefer to use a pivot table, just add the parameter dropna=False to get the result you want.


    First answer:

    If you want the number of missing values per features for only the child you can use isna/isnull directly after filtering:

    data = {'survived': [0, 1, 1, 1, 0], 
            'pclass': [3, 1, None, 1, 3], 
            'sex': ['male', 'female', 'female', 'female', 'male'], 
            'age': [22, 38, None, None, 35], 
            'class_': ['Third', 'First', None, 'First', 'Third'], 
            'who': ['man', 'woman', 'child', 'child', 'man'], 
            'deck': [None, 'C', None, 'C', None], 
            'alive': ['no', 'yes', 'yes', 'yes', 'no'], 
            'alone': [False, False, True, False, True] } 
    df = pd.DataFrame(data)
    
    display(df[df["who"] == "child"].isna().sum())
    
    survived    0
    pclass      1
    sex         0
    age         2
    class_      1
    who         0
    deck        1
    alive       0
    alone       0