Search code examples
pandascrosstabsubtotal

Trying to get subtotals from a pandas dataframe


I'm doing cross-tabulation between two columns in the dataframe. Here's a sample from the columns:

 column_1    column_2

    A          -8
    B          95
    A         -93
    D          11
    C         -62
    D         -14
    A         -55
    C          66
    B          76
    D         -49

I'm looking for a code that returns sub totals for A, B, C and D. For instance, for A the subtotal will be -156 (-8-93-55 = -156).

I tried to do that with pandas.crosstab() function:

pandas.crosstab(df[column_1], df[column_2], margins=True, margins_name=column_1).Total

Here's a sample of the output:

     -271  -263  -241  -223  -221  -212  -207  -201 ...   sum_column
A      1     0     1     0     0     1     0     0  ...      ##
B      0     0     0     1     0     0     0     0  ...      ##
C      0     0     0     0     1     0     0     1  ...      ##
D      0     1     0     0     0     0     1     0  ...      ##

The sum column consists of the sums of the boolean values in each row, instead of the sub totals for each of the four letters. I saw once that a boolean table can be used for calculations but I quite sure that by changing the pandas.crosstab() command the desired output can be achieved.

I'd be happy to get some ideas and thoughts from you.

Thanks.


Solution

  • If you'd simply like the totals by the individual categories in column_1 (A, B, C, D), maybe a groupby and summation could be helpful! You would call the groupby on the column with your categories, and then call sum on the result, like this:

    df.groupby('column1').sum()