Search code examples
pythonmatlablinear-algebrapagerank

PageRank python implementation, algorithm


I'm trying to write this matlab implementation in Python, but I don't understand a few moments:

last_v = ones(N, 1) * inf;

Do we obtain here a vector, containing all infinities? In this case the condition in while is instantly false and we don't obtain any iterations:

while(norm(v - last_v, 2) > v_quadratic_error)

What do I understand false?

This is the way, I tried it to do:

from numpy import *

def pagerank(M,d,v_quadratic_error):
    count = 0
    N=M.shape[1]
    v=random.rand(N,1)
    v=v/linalg.norm(v)
    ainf=array([[inf]])
    last_v = dot(ones((N,1)),ainf)
    R = d*M + ((1-d)/N * ones((N,N)))

    while linalg.norm(v-last_v,2) > v_quadratic_error:
        last_v = v
        v = dot(R,v)
        count+=1
        print 'iteration #', count
    return v

Solution

  • In Matlab/Octave:

    octave:4> last_v = ones(N, 1) * inf;
    octave:10> norm(v - last_v, 2)
    ans = Inf
    octave:13> norm(v - last_v, 2) > v_quadratic_error
    ans =  1
    

    In Python:

    In [139]: last_v = np.dot(np.ones((N,1)),ainf)
    In [140]: np.linalg.norm(v - last_v, 2)
    Out[140]: nan
    In [141]: np.linalg.norm(v - last_v, 2) <= v_quadratic_error
    Out[141]: False
    

    So the condition is True in Matlab/Octave but the similar expression in Python is False. In Python, instead of using

    while linalg.norm(v-last_v,2) > v_quadratic_error:
    

    you could use

    while True:
        last_v = v
        ...
        if np.linalg.norm(v - last_v, 2) <= v_quadratic_error: break
    

    This guarantees that the flow of execution enters the while-loop at least once, and then breaks when the condition is True. By that point last_v will have a finite value, so the NaN versus Inf issue is avoided.


    import numpy as np
    
    def pagerank(M, d, v_quadratic_error):
        count = 0
        N = M.shape[1]
        while True:
            v = np.random.rand(N, 1)
            if (v != 0).all(): break
        v = v / np.linalg.norm(v)
        R = d * M + ((1 - d) / N * np.ones((N, N)))
        while True:
            last_v = v
            v = np.dot(R, v)
            count += 1
            print('iteration # {}: {}'.format(count, np.isfinite(v)))
            if np.linalg.norm(v - last_v, 2) <= v_quadratic_error: break
        return v
    
    M = np.array(np.mat('0 0 0 0 1 ; 0.5 0 0 0 0 ; 0.5 0 0 0 0 ; 0 1 0.5 0 0 ; 0 0 0.5 1 0'))
    print(pagerank(M, 0.80, 0.001))
    

    yields (something like)

    [[ 0.46322263]
     [ 0.25968575]
     [ 0.25968575]
     [ 0.38623472]
     [ 0.48692059]]