The code is to build a pd.crosstab with Titanic dataset in Seaborn. The column sums in the output table look disarrayed.
import pandas as pd
import seaborn as sns
titanic = sns.load_dataset('titanic')
bin = [0,15,100]
titanic["adult"] = pd.cut(titanic.age, bin, labels=["kid","adult"])
pd.crosstab(titanic.survived, titanic.adult, normalize=True, margins=True)
I expected to have 0.116246 / 0.883754 / 1.000000
, but it gives 0.883754 / 0.116246 / 1.000000
in the last row where the column sums should be placed.
The flipping/reversal of totals is simply due to the presence of NaN values in the original age
column, and subsequently in the binned adult
column you created. You should just add dropna=False
to your pd.crosstab()
command, which will return the right result:
pd.crosstab(titanic.survived, titanic.adult, dropna=False, normalize=True, margins=True)
adult kid adult All
survived
0 0.047619 0.546218 0.616162
1 0.068627 0.337535 0.383838
All 0.116246 0.883754 1.000000