Search code examples
pythonlistgraphnetworkx

How transform colums with list to adjacency matrix


I have dataframe and one column contain lists about market basket. For example:

|receipt_no |items|
|--| --- |
|0 |  ['2672', '121431', '121', '49292', '1827']|
|1 |['331', '131', '121']|
|2 |['121', '49292', '1313']|
|3 |['131345', '24', '2314', '194', '91913']|

items contain list of basket articles

I want to create articles graph (weighted graph) with Networkx, but i don't know how create matrix of adjacency of articles. The cell should contain the number of checks in which this pair of articles met.

| |121|131|331|49292|...|
|--| --- |--| --- |--| --- |
|121| 0 |1| 1 |2| ... |
|131| 1|0| 1 |0| ... |
|331|1|1| 0 |0| ...|
|49292|2|0| 0 |0| ...|
|...|...|...| ... |...| ...|

Help me build an adjacency matrix based on checks to implement the graph


Solution

  • Following some discussion in comments, I now understand the OP's design of his 'adjacency matrix'. Properly formatted, it looks like this:

    121 131 331 49292 ...
    121 0 1 1 2 ...
    131 1 0 1 0 ...
    331 1 1 0 0 ...
    49292 2 0 0 0 ...
    ... ... ... ... ... ...

    This is a poor design because each time you need to find if two nodes are adjacent, you have to search through the node identifiers to find the row and column for both nodes - the performance will be terrible.

    On input you need to assign an index for each new node identifier you encounter, like this:

    ID index
    '2672' 0
    '121431' 1
    '121' 2
    ... ...

    The indices will be used to look up the rows and columns in the adjacency matrix - which is almost instant.

    The question does not specify if the graph is directed or undirected. From the sample 'adjacency matrix' posted, it looks like the intention is for an undirected graph - so the leading diagonal and the upper triangle of the matrix are redundant.

    Finally, it is not usual practice to store the edge weights in the adjacency matrix. One snag is that you cannot distinguish between a missing edge and an edge of weight 0. Usually, the edges are also given indices and the edge weight, along with any other edge data, is stored in another map. The adjacency matrix is boolean.

    So, after all that, here is how an adjacency matrix will look

    1
    1 1
    1 0 0
    

    You will find that most sample code for the standard graph theory algorithms expects an adjacency matrix like this. So , apart from performance and memory space issues, it will make your life simpler if you use this design.