Search code examples
pythonpandasdataframexlsx

Transform data from column vector to matrix


So effectively I've got a xlsx file that looks like this:

Label-Label   Data
A-B           1
C-D           5
A-D           4
B-F           10
H-I           12

So my goal is to use the "Label-Label" column to make this data a matrix by splitting on the "-". What I mean by that is to go from the above data to this:

  B D F I
A 1 4 0 0
C 0 5 0 0
B 0 0 10 0
H 0 0 0 12

The goal is to have it such that each (row, column) corresponds to the "row-column" that is the "label-label". Any row-column combination that's not in the initial dataset would just be set to 0 here and any repeat row values in the initial data get stored in the same row in the matrix data (e.g. A). Likewise, any repeat column values store their values in the same column in the matrix (e.g. D).


Solution

  • Start by reading into a pandas dataframe, and then do a pivot_table:

    (df['Label-Label'].str.split('-', expand=True)
                      .assign(data=df.Data)
                      .pivot_table('data',0,1, fill_value=0))
    
    1    B    D     F     I
    0                      
    A  1.0  4.0   0.0   0.0
    B  0.0  0.0  10.0   0.0
    C  0.0  5.0   0.0   0.0
    H  0.0  0.0   0.0  12.0