python network-programming networkx social-networking

How to make network graph normalised data

I am new in this era. I have a data with actors and movies.

I'm trying to make network analysis and find communities. So I take my data, make matrix multiplication with its transpose and normalised it. Now I want to make it network graph. I tried to do with networkx library but couldn't make it. I don't have any experience so I'm open for all suggestions.

Solution

You can use NetworkX to map this network and hopefully I can help with this part!

First, you will need to import the NetworkX library (using import networkx as nx).

Next, import the data. Since your data is in matrix format (rather than a simple edge list), it is a little more complicated to import into NetworkX. I would suggest first converting the data into a NumPy matrix and then creating the graph using the NetworkX from_numpy_matrix function.

I will run through an example using dummy data. A screenshot of this data can be found:here (a simplified version of your normalised dataset)

This is the code I used to import the data and create the graph:

    import numpy as np
    import pandas as pd
    import networkx as nx

    df=pd.read_csv('matrixdata.csv', sep=',', index_col=0) # read data
    matrix = np.asmatrix(df.values) # convert data to NumPy matrix
    G = nx.from_numpy_matrix(matrix) # create graph in networkx

Now I can print, for example, the number of edges in my network using print(len(G.edges()), which returns 5 edges (since I made a 5x5 matrix with only 5 connections).

From here, we can perform a number of measures on the network (e.g. density, degree etc.).

However, just a word of caution with your data as pictured above. In the normalised version, all of the node ID's have been changed from films (in rows) and actors (in columns) to standard integers (starting from 0 in both the columns and the rows). NetworkX will treat these as your node ID's, which means that it will think that node 0 in your first column is the same as node 0 in your first row. Possibly this is your intention because you have normalised the data, but it is worth flagging up. The repercussions of this are that your connections will all be treated as 'self-directed' i.e. node 0 only connects to node 0, node 1 only connects to node 1 and so on.

It is also worth noting that the normalised values which are in your cells will be treated by NetworkX as the weight of the edge/tie, since they don't simply reflect a binary relationship (where 1 represents a connection and 0 the absence of a connection. As such, NetworkX will create a weighted rather than binary graph (although there are ways you can remove weights and convert to a simple binary graph). In my example, I also added weights (all of 0.5) rather than binary connections to demonstrate what will happen. If I print the edge attribute data for my created network, you will see that weights have automatically been added:

    print(G.edges(data=True) #edges=True parameter shows all edge attributes

Returns:

    [(0, 0, {'weight': 0.5}), (1, 1, {'weight': 0.5}), (2, 2, {'weight': 0.5}), (3, 3, {'weight': 0.5}), (4, 4, {'weight': 0.5})]

Anyway, hopefully this will at least help with creating the graph in NetworkX!