Search code examples
pythonnetwork-programmingnetworkxsocial-networking

How to make network graph normalised data


I am new in this era. I have a data with actors and movies. enter image description here

I'm trying to make network analysis and find communities. So I take my data, make matrix multiplication with its transpose and normalised it. enter image description here Now I want to make it network graph. I tried to do with networkx library but couldn't make it. I don't have any experience so I'm open for all suggestions.


Solution

  • You can use NetworkX to map this network and hopefully I can help with this part!

    First, you will need to import the NetworkX library (using import networkx as nx).

    Next, import the data. Since your data is in matrix format (rather than a simple edge list), it is a little more complicated to import into NetworkX. I would suggest first converting the data into a NumPy matrix and then creating the graph using the NetworkX from_numpy_matrix function.

    I will run through an example using dummy data. A screenshot of this data can be found:here (a simplified version of your normalised dataset)

    This is the code I used to import the data and create the graph:

        import numpy as np
        import pandas as pd
        import networkx as nx
    
        df=pd.read_csv('matrixdata.csv', sep=',', index_col=0) # read data
        matrix = np.asmatrix(df.values) # convert data to NumPy matrix
        G = nx.from_numpy_matrix(matrix) # create graph in networkx
    

    Now I can print, for example, the number of edges in my network using print(len(G.edges()), which returns 5 edges (since I made a 5x5 matrix with only 5 connections).

    From here, we can perform a number of measures on the network (e.g. density, degree etc.).

    However, just a word of caution with your data as pictured above. In the normalised version, all of the node ID's have been changed from films (in rows) and actors (in columns) to standard integers (starting from 0 in both the columns and the rows). NetworkX will treat these as your node ID's, which means that it will think that node 0 in your first column is the same as node 0 in your first row. Possibly this is your intention because you have normalised the data, but it is worth flagging up. The repercussions of this are that your connections will all be treated as 'self-directed' i.e. node 0 only connects to node 0, node 1 only connects to node 1 and so on.

    It is also worth noting that the normalised values which are in your cells will be treated by NetworkX as the weight of the edge/tie, since they don't simply reflect a binary relationship (where 1 represents a connection and 0 the absence of a connection. As such, NetworkX will create a weighted rather than binary graph (although there are ways you can remove weights and convert to a simple binary graph). In my example, I also added weights (all of 0.5) rather than binary connections to demonstrate what will happen. If I print the edge attribute data for my created network, you will see that weights have automatically been added:

        print(G.edges(data=True) #edges=True parameter shows all edge attributes
    

    Returns:

        [(0, 0, {'weight': 0.5}), (1, 1, {'weight': 0.5}), (2, 2, {'weight': 0.5}), (3, 3, {'weight': 0.5}), (4, 4, {'weight': 0.5})]
    

    Anyway, hopefully this will at least help with creating the graph in NetworkX!