Search code examples
pandasnumpydiagonal

Why is numpy fill_diagonal not changing dataframe values


I'm trying to use numpy.fill_diagonal() as shown in this answer to change the diagonal entries of my dataframe to be 20,000. However, when I run the example below, the diagonal entries in my dataframe stay zero. My question is "Why?"

Please don't post some answer showing how to change the diagonal entries with a for loop. I can change the diagonal entries myself. My question is why the below example does not work to change the entries. If I don't understand why the below example fails, I cannot rely on numpy.fill_diagonal() in any of my code. I'm using numpy version 2.1.0 and pandas version 2.2.2.

import numpy as np
import pandas as pd

nodes = ['A','B','C','D','E','F','G','H']
df_cap = pd.DataFrame(None, index=nodes, columns=nodes)

edges = {
    ('A', 'B'): 809, ('A', 'C'): 184, ('A', 'D'): 440, ('B', 'C'): 134, 
    ('B', 'E'): 277, ('B', 'F'): 138, ('C', 'D'): 194, ('C', 'F'): 144,
    ('C', 'G'): 139, ('D', 'E'): 190, ('D', 'G'): 284, ('E', 'F'): 100,
    ('E', 'H'): 281,('F', 'H'): 922,('G', 'F'): 123,('G', 'H'): 232
}

for i in nodes:
    for j in nodes:
        if (i,j) in edges:
            df_cap.loc[i,j] = edges[(i,j)]

nodes = nodes + ["Dummy"]


df_cap = df_cap.astype(float).fillna(0)
df_cap.loc["Dummy", :] = 0
df_cap.loc[:, "Dummy"] = 0
np.fill_diagonal(df_cap.values, 20000)

df_cap.loc["A", "Dummy"] = 20000
df_cap.loc["Dummy", "H"] = 20000
df_cap

Solution

  • The issue is that you initialized an object DataFrame and used a loop to fill your DataFrame, this created a fragmented DataFrame (i.e. not a monobloc underlying numpy array) and using df.values is making a copy (not a view).

    A simple workaround could be to make a copy before np.fill_diagonal:

    df_cap = df_cap.copy()
    

    A better approach would be to avoid using a loop to create df_cap:

    nodes = ['A','B','C','D','E','F','G','H']
    
    df_cap = (pd.Series(edges).unstack(fill_value=0)
                .reindex(index=nodes, columns=nodes, fill_value=0)
             )
    np.fill_diagonal(df_cap.values, 20000)
    

    And without fill_diagonal:

    df_cap = (pd.Series(edges|{(k, k): 2000 for k in nodes})
                .unstack(fill_value=0)
                .reindex(index=nodes, columns=nodes, fill_value=0)
             )
    

    Output:

           A      B      C      D      E      F      G      H
    A  20000    809    184    440      0      0      0      0
    B      0  20000    134      0    277    138      0      0
    C      0      0  20000    194      0    144    139      0
    D      0      0      0  20000    190      0    284      0
    E      0      0      0      0  20000    100      0    281
    F      0      0      0      0      0  20000      0    922
    G      0      0      0      0      0    123  20000    232
    H      0      0      0      0      0      0      0  20000