Search code examples
pythonpandasnumpymatrixcrosstab

Making a matrix-format from python


I have the following data in my dataframe B:

F1     F2    Count
A      C      5
B      C      2
B      U      6
C      A      1

I want to make a square matrix out of them so the results will be:

    A    B   C  U
A   0    0   6  0
B   0    0   2  6
C   6    2   0  0
U   0    6   0  0

I initially used pd.crosstab() but some variables in F1/F2 is missing in the matrix.

AC = 5 CA = 1 therefore the output should be 6.

Also pdcrosstab() does not recognize BU = UB, etc.

Anyone who could help? I am basically new to python.

Btw, this is my code:

wow=pd.crosstab(B.F1, 
            B.F2, 
            values=B.Count, 
            aggfunc='sum',
            ).rename_axis(None).rename_axis(None, axis=1)

Solution

  • You can make columns F1 and F2 categorical and use crosstab for the work.

    FDtype = pd.CategoricalDtype(list("ABCU"))
    
    df[["F1", "F2"]] = df[["F1", "F2"]].astype(FDtype)
    count = pd.crosstab(df["F1"], df["F2"], df["Count"], aggfunc='sum', dropna=False)
    count.fillna(0, inplace=True, downcast="infer")
    count += count.T
    

    Remark: it is more efficient to specify the column dtypes while the DataFrame is constructed