I have a problem with representing website user behaviour in a Adjacency Matrix in Python. I want to analyze the user interaction between 43 different websites to see which websites are used together.
The given data set has about 13.000.000 lines with following structure:
user website
id1 web1
id1 web2
id1 web2
id2 web1
id2 web2
id3 web3
id3 web2
I would like to visualize the interactions between the website in a Adjacency Matrix like this:
web1 web2 web3
web1 2 2 0
web2 2 4 1
web3 0 1 1
I'm happy for any advice
import scipy.sparse
data = """
id1 web1
id1 web2
id1 web2
id2 web1
id2 web2
id3 web3
id3 web2
"""
data = np.array(data.split()).reshape(-1, 2)
_, i = np.unique(data[:, 0], return_inverse=True)
_, j = np.unique(data[:, 1], return_inverse=True)
incidence = scipy.sparse.coo_matrix((np.ones_like(i), (i,j)))
adjecency = incidence.T * incidence
print(adjecency.todense())