User / item view data is available as below
User Item
Louis 1
Louis 2
Adam 1
Adam 3
And I want to transform it into an item by item metric like
1 2 3
1 0 1 1
2 1 0 0
3 1 0 0
So each value represents "number of person who viewed i also viewed j". (diagonal value does not matter)
Is there any efficient way of doing this?
Below is my code but took a long time to run when there is around 50k items and 500k view records.
import pandas as pd
import numpy as np
from scipy.sparse import csr_matrix, lil_matrix
raw = pd.DataFrame(columns=['user','item'])
raw['user']=['Louis','Louis','Adam','Adam']
raw['item']=[1,2,1,3]
item_list = raw.item.unique().tolist()
user_list = raw.user.unique().tolist()
m = lil_matrix((len(raw.item.unique()),len(raw.item.unique())))
for user in user_list:
temp = raw.loc[np.in1d(raw['user'], user)].item
if len(temp) > 1:
for idx1, id1 in enumerate(temp[0:-1]):
for id2 in temp[idx1+1:]:
m[item_list.index(id1),item_list.index(id2)]+=1
m[item_list.index(id2),item_list.index(id1)]+=1
m.toarray()
You could use
In [147]: dff = pd.crosstab(df.Item, df.User)
In [148]: dff = dff.dot(dff.T)
In [149]: np.fill_diagonal(dff.values, 0)
In [150]: dff
Out[150]:
Item 1 2 3
Item
1 0 1 1
2 1 0 0
3 1 0 0