Search code examples
pythonpandasdataframepivot-tablefrequency

How can I create a cross-tab of two columns in a dataframe in Pandas and generate a total row and column in the output?


I have created a dataframe from a CSV file and now I'm trying to create a cross-tab of two columns ("Personal_Status" and "Gender"). The output should look like this including the frequencies of each gender/personal status combination and the totals of each generated row and column.

Personal Status  F    M    (blank)  Grand Total
Divorced         135  21            156
Married               36            36
Single                232  1        233
Grand Total      135  289  1        425

I tried this but it's not quite where I want it:

creditData[["Personal_Status", "Gender"]].value_counts()

The output includes a column of each "Personal_Status" value, a "Gender" column, and the frequency of each combination, i.e row 1 = "Single, M, 232"


Solution

  • Something like this?

    import pandas as pd
    
    df = pd.DataFrame({
        'Name': ['Kathy', 'Linda', 'Peter'],
        'Gender': ['F', 'F', 'M'],
        'Personal_Status': ['Divorced', 'Married', 'Married']})
    
    df2 = pd.crosstab(df['Personal_Status'], df['Gender'])
    
    df2.loc['Grand Total']= df2.sum(numeric_only=True, axis=0)
    df2.loc[:, 'Grand Total'] = df2.sum(numeric_only=True, axis=1)
    
    print(df2)
    

    Output

    Gender           F  M  Grand Total
    Personal_Status                   
    Divorced         1  0            1
    Married          1  1            2
    Grand Total      2  1            3