Search code examples
pandasmatrixpivot-table

Pandas: Dataframe Crosstab Output To Matrix


I have a simple dataframe with two columns and I would like to generate an output table showing the differences between rows as a matrix?

df = pd.DataFrame({'Subject': ['Alpha', 'Bravo', 'Charlie'],
                    'Total': [31.05590, 32.91925, 36.02484]})
Subject Total
Alpha 31.05590
Bravo 32.91925
Charlie 36.02484

The expected output is basically the differences between rows.

For example:

Alpha-Charlie = Alpha(Total) - Charlie(Total) -> (31.05590 - 36.02484) = -4.96894 and

Bravo-Alpha = Bravo(Total) - Alpha(Total) -> (32.91925 - 31.05590) = 1.86335.

Subject Alpha Bravo Charlie
Alpha 0.00000 -1.86335 -4.96894
Bravo 1.86335 0.00000 -3.10559
Charlie 4.96894 3.10559 0.00000

So far, I have not made much progress!

ct_a = pd.crosstab(df['Subject'], df['Subject'], values=df['Total'], aggfunc=np.sum).fillna(0)
ct_a.to_csv('data/ct_a.csv', index=True)

This code generates the following output:

Subject Alpha Bravo Charlie
Alpha 31.05590 0.00000 0.00000
Bravo 0.00000 32.91925 0.00000
Charlie 0.00000 0.00000 36.02484

I would welcome any advice on how to proceed?


Solution

  • Here is an approach using pandas.merge and pandas.crosstab :

    out = (
            df.merge(df, how="cross", suffixes=("", "_c"))
                .assign(Total= lambda x: x["Total"].sub(x["Total_c"]))
                .pipe(lambda d: pd.crosstab(d["Subject"], d["Subject_c"],
                                            values=d["Total"], aggfunc="sum"))
                .rename_axis(None, axis=1)
                .reset_index()
          )
    

    # Output :

    print(out)
    
       Subject    Alpha    Bravo  Charlie
    0    Alpha  0.00000 -1.86335 -4.96894
    1    Bravo  1.86335  0.00000 -3.10559
    2  Charlie  4.96894  3.10559  0.00000