Search code examples
pythonpandasmatrixrowmultiple-columns

Transform Matrix - Showing all the 0 value


I have a huge dataframe. The data looks like this:

Person  Distance    BS
A       125.58      BS3
A       212.01      BS4
B       11.41       BS3
B       134.35      BS2
C       11.41       BS3
C       274.20      BS2
D       220.98      BS5
D       8.01        BS7
E       606.05      BS1
E       676.88      BS2
F       28.81       BS7
F       98.69       BS5
G       81.64       BS1
G       35.49       BS3

I transform this dataset into OD matrix based on this question Is it possible from dataframe transform to Matrix? with this code:

df = pd.read_csv("data.csv")
df = df[df.Distance < 100]
df = df[df.groupby('Person').Person.transform(len) > 1]
places = df["BS"].unique()
places.sort()
od_df = pd.DataFrame(df["BS"].values.reshape((-1, 2)), columns=["O", "D"])
od_matrix = pd.pivot_table(od_df, index="O", columns="D", aggfunc="size").reindex(index=places, columns=places)
od_matrix.fillna(0, downcast="infer", inplace=True)
od_matrix

I want to eliminate the distance above 100 meters. Therefore, I put distance < 100. The result looks like this:

D   BS1 BS3 BS5 BS7
O               
BS1 0   1   0   0
BS3 0   0   0   0
BS5 0   0   0   0
BS7 0   0   1   0

if my huge data is from BS1 to BS9, after elimination the matrix column and row no data (0) will dissapear too. How to show the all columns and rows event if no data (0)?. I want to show the matrix looks like this:

D   BS1 BS2 BS3 BS4 BS5 BS6 BS7 BS8 BS9
O                                   
BS1 0   0   1   0   0   0   0   0   0
BS2 0   0   0   0   0   0   0   0   0
BS3 0   0   0   0   0   0   0   0   0
BS4 0   0   0   0   0   0   0   0   0
BS5 0   0   0   0   0   0   0   0   0
BS6 0   0   0   0   0   0   0   0   0
BS7 0   0   0   0   1   0   0   0   0
BS8 0   0   0   0   0   0   0   0   0
BS9 0   0   0   0   0   0   0   0   0

Solution

  • First is reaasign first filtered DataFrame to df1, get unique places by list comprehension with f-strings and add fill_value=0 parameter to functions pivot_table and reindex:

    df1 = df[df.Distance < 100]
    df1 = df1[df1.groupby('Person').Person.transform(len) > 1]
    places = [f'BS{ x + 1}' for x in range(9)]
    print (places)
    ['BS1', 'BS2', 'BS3', 'BS4', 'BS5', 'BS6', 'BS7', 'BS8', 'BS9']
    
    
    od_df = pd.DataFrame(df1["BS"].values.reshape((-1, 2)), columns=["O", "D"])
    od_matrix = (pd.pivot_table(od_df, index="O", columns="D", aggfunc="size", fill_value=0)
                   .reindex(index=places, columns=places, fill_value=0))
    

    Or:

    od_matrix = (pd.crosstab(od_df["O"], od_df["D"])
                   .reindex(index=places, columns=places, fill_value=0))
    

    print (od_matrix)
    D    BS1  BS2  BS3  BS4  BS5  BS6  BS7  BS8  BS9
    O                                               
    BS1    0    0    1    0    0    0    0    0    0
    BS2    0    0    0    0    0    0    0    0    0
    BS3    0    0    0    0    0    0    0    0    0
    BS4    0    0    0    0    0    0    0    0    0
    BS5    0    0    0    0    0    0    0    0    0
    BS6    0    0    0    0    0    0    0    0    0
    BS7    0    0    0    0    1    0    0    0    0
    BS8    0    0    0    0    0    0    0    0    0
    BS9    0    0    0    0    0    0    0    0    0