Pandas: How to summarize pandas cross table / frequency matrix

I'd like to summarize the results of a cross table/Frequency Matrix which looks at frequency of users having the same session, results is 4044 rows × 4044 columns

UserID  10  50  30  2488  9416 23197            ... 
UserID                                                                                  
10      4   0   0   0   0   0   0   0   0   0   ... 0   0   0   0   0   0   0   0   0   0
50      0   48  2   9   4   0   0   0   0   0   ... 0   0   0   0   0   0   0   0   0   0
30      0   2   2   2   2   0   0   0   0   0   ... 0   0   0   0   0   0   0   0   0   0
2488    0   9   2   32  4   0   0   0   0   0   ... 0   0   0   0   0   0   0   0   0   0
9416    0   4   2   4   4   0   0   0   0   0   ... 0   0   0   0   0   0   0   0

Is there a way to summarize it to have user-to-user number of matches, something like:

UserID  UserID Occurance
10       50      2
30       2488    5
23197    10      3
30       50      1

Solution

Solution:

Create a boolean mask that selects the upper triangular, non-diagonal values from the starting matrix
Reshape both this mask (with .reshape()) and the original matrix (with .stack()) into column vectors of equal size
Use boolean indexing to pick out the rows that you need.

Example:

import pandas as pd
import numpy as np

np.random.seed(1)

# Example data
df = pd.DataFrame(pd.np.random.randint(0, 4, size=(5, 5)), 
                  index=[10, 50, 30, 2488, 9416], 
                  columns=[10, 50, 30, 2488, 9416])

# Quick and dirty method to make the example data symmetric
df = df + df.T

df
      10    50    30    2488  9416
10       2     4     0     0     5
50       4     6     2     5     1
30       0     2     0     4     3
2488     0     5     4     4     0
9416     5     1     3     0     6

# To select the upper-triangular, non-diagonal entries,
# take a *lower*-triangular mask, np.tril, 
# and negate it with ~.
mask = (~np.tril(np.ones(df.shape)).astype('bool'))
mask
array([[False,  True,  True,  True,  True],
       [False, False,  True,  True,  True],
       [False, False, False,  True,  True],
       [False, False, False, False,  True],
       [False, False, False, False, False]])

# Prepare to select rows from the stacked df
mask = mask.reshape(df.size)

# Stack the columns of the starting matrix into a MultiIndex, 
# which results in a MultiIndexed Series;
# select the upper-triangular off-diagonal rows;
# reset the MultiIndex levels into columns
df.stack()[mask].reset_index().rename({'level_0': 'UserID_row', 
                                       'level_1': 'UserID_col', 
                                       0: 'Occurrence'}, axis=1)
   UserID_row  UserID_col  Occurrence
0          10          50           4
1          10          30           0
2          10        2488           0
3          10        9416           5
4          50          30           2
5          50        2488           5
6          50        9416           1
7          30        2488           4
8          30        9416           3
9        2488        9416           0