I'm doing cross-tabulation between two columns in the dataframe. Here's a sample from the columns:
column_1 column_2
A -8
B 95
A -93
D 11
C -62
D -14
A -55
C 66
B 76
D -49
I'm looking for a code that returns sub totals for A, B, C and D. For instance, for A the subtotal will be -156 (-8-93-55 = -156).
I tried to do that with pandas.crosstab()
function:
pandas.crosstab(df[column_1], df[column_2], margins=True, margins_name=column_1).Total
Here's a sample of the output:
-271 -263 -241 -223 -221 -212 -207 -201 ... sum_column
A 1 0 1 0 0 1 0 0 ... ##
B 0 0 0 1 0 0 0 0 ... ##
C 0 0 0 0 1 0 0 1 ... ##
D 0 1 0 0 0 0 1 0 ... ##
The sum column consists of the sums of the boolean values in each row, instead of the sub totals for each of the four letters. I saw once that a boolean table can be used for calculations but I quite sure that by changing the pandas.crosstab()
command the desired output can be achieved.
I'd be happy to get some ideas and thoughts from you.
Thanks.
If you'd simply like the totals by the individual categories in column_1
(A, B, C, D), maybe a groupby and summation could be helpful! You would call the groupby
on the column with your categories, and then call sum on the result, like this:
df.groupby('column1').sum()