I'm implementing some iterative algorithms for calculating the PageRank of a webgraph, and I'm having some trouble finding the best way to store in memory some matrixes.
I have a B
n x n
matrix, which rappresents the webgraph ( B[i,j]=1/outdegree[j]
if there's an arc from j
to i
, 0
otherwise; outdegree[j]
is the number of outgoing arcs from node j
) and which I'm storing as a scipy.sparse.dok_matrix
since of course it has mostly 0
entries. The problem is I have to compute many matrix x vector products of the type Px
, where
P = B + (1/n)*e*d^T
where e
is the all ones vector and d
is a boolean vector that has 1
in component j
if outdegree[j] > 0
. Basically e*d^T
is sort of a linear algebra "trick" to write a n x n
matrix with columns either all made of 1
s or 0
s, depending if the corresponding entry in d
is 1
or 0
.
So I'm struggling with two, not totally independent, things:
e*d.T
simply computes the scalar product, while I want a matrix. I guess it's some clever use of broadcasting and slicing, but I'm still new with numpy and can't figure it outP
as above (supposing I've found a solution to 1.), I loose the memory advantage I gained storing B
as a sparse matrix, and suddenly I need to store n^2
floats. And anyways the matrix I'm adding to B
is very redundant (there are only two types of columns), so there must be a better way than storing the whole matrix in memory. Any suggestions? Keep in mind that it has to be in a way as to easily allow computation of P.dot(x)
, for x
an arbitrary vectorFor simplicity, as expressions with np.dot
will be bulky, let ∙
denote matrix multiplication, e
, d
and x
be vectors, i.e have shape (n, 1), and in expression with square brackets *
is a python's list multiplcation. Then, by associativity
(e∙d.T)∙x = e∙(d.T∙x) = [[d.T∙x] * n]
where d.T∙x
is a scalar, and
P∙x = B∙x + 1/n * e∙d.T∙x = B∙x + 1/n * [[d.T∙x] * n]
so to be able to make your computations, you can store only vector d. Note that d.T∙x
(or equivalently np.dot(d.T, x)
if arrays used) is vectors product and is a cheap operation relative to matrix multiplication.