Search code examples
pythongraphedge-list

Constructing undirected edge list


I'm trying to construct a simple edge list for an undirected graph containing all possible edges. I used to do it using the cartesian product of the node list by itself & then filtering out duplicated & self edges. This time the size of the input is too large to afford to store unnecessary edges momentarily. Thus, I'm trying to use nested loops to get the needed edges directly from the first time.

Here is the code I wrote:

node_list = ['A', 'B', 'C', 'D']
for i in node_list:
    for j in node_list:
        if i < j:
            source.append(i)
            target.append(j)
        
loop_data = pd.DataFrame({'source': source, 'target':target})
print(loop_data)

The output I get is pretty unexpected. Instead of saving the source & target nodes in their respective lists, the program is keeping both the source & the target nodes in both source & target columns. Here is the current state of the output.

   source target
0       A      A
1       B      B
2       A      A
3       C      C
4       A      A
5       D      D
6       B      B
7       C      C
8       B      B
9       D      D
10      C      C
11      D      D

This is the expected form of output (ignore row indexing) :

   source target
1       A      B
2       A      C
3       A      D
6       B      C
7       B      D
11      C      D

I can't find where the problem exists. The issue seems to be with appending to both of the source & target lists.


Solution

  • If you initialise each variable as the empty list ([]) the code works as expected. So I suggest you review how you are initialising the source and target variables:

    source = []  # empty list
    target = []  # different empty list
    node_list = ['A', 'B', 'C', 'D']
    
    for i in node_list:
        for j in node_list:
            if i < j:
                source.append(i)
                target.append(j)
    
    loop_data = pd.DataFrame({'source': source, 'target': target})
    print(loop_data) 
    

    Output

      source target
    0      A      B
    1      A      C
    2      A      D
    3      B      C
    4      B      D
    5      C      D
    

    Since you want to construct an undirected graph, I suggest you use itertools.combinations to generate the edges:

    import pandas as pd
    from itertools import combinations
    
    node_list = ['A', 'B', 'C', 'D']
    
    res = pd.DataFrame(data=combinations(node_list, r=2), columns=["source", "target"])
    print(res)
    

    Output

      source target
    0      A      B
    1      A      C
    2      A      D
    3      B      C
    4      B      D
    5      C      D
    

    The reason for using combinations is that it avoids the Python level for-loop therefore for larger lists it should be faster. It is also in my opinion more pythonic.

    Timings

    For a node_list of 676 elements, I get the following timings:

    %timeit c = combinations_approach(node_list)
    19.7 ms ± 212 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
    %timeit o = original_approach(node_list)
    41.2 ms ± 64.1 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
    

    where the functions combinations_approach and original_approach are:

    def original_approach(node_list):
        source = []
        target = []
    
        for i in node_list:
            for j in node_list:
                if i < j:
                    source.append(i)
                    target.append(j)
    
        return pd.DataFrame({'source': source, 'target': target})
    
    
    def combinations_approach(node_list):
        return pd.DataFrame(combinations(sorted(node_list), r=2), columns=["source", "target"])