The columns are disarrayed in pandas crosstab

jupyter notebook image

The code is to build a pd.crosstab with Titanic dataset in Seaborn. The column sums in the output table look disarrayed.

import pandas as pd
import seaborn as sns

titanic = sns.load_dataset('titanic')

bin = [0,15,100]
titanic["adult"] = pd.cut(titanic.age, bin, labels=["kid","adult"])
pd.crosstab(titanic.survived, titanic.adult, normalize=True, margins=True)

I expected to have 0.116246 / 0.883754 / 1.000000, but it gives 0.883754 / 0.116246 / 1.000000 in the last row where the column sums should be placed.

Solution

The flipping/reversal of totals is simply due to the presence of NaN values in the original age column, and subsequently in the binned adult column you created. You should just add dropna=False to your pd.crosstab() command, which will return the right result:

pd.crosstab(titanic.survived, titanic.adult, dropna=False, normalize=True, margins=True)

adult   kid     adult       All
survived            
0   0.047619    0.546218    0.616162
1   0.068627    0.337535    0.383838
All 0.116246    0.883754    1.000000