Search code examples
algorithmgraphgraph-theoryweighted

How to find probability of path in a directed graph?


I have a directed weighted graph G=(V,E).

In this graph the weight of edge(v[i],v[j]) is the count of transition between v[i] and v[j].

I am trying to determine the best way to accomplish task: how to find the probability P of path from node A to node B

All weights are positive integers.

For example,

weight(A,B)=count of transition from A to B
weight(B,C)=count of transition from B to C
weight(B,D)=count of transition from B to D
weight(D,C)=count of transition from D to C

And we have several paths:

A->B->C 
A->B->D->C

So, how to calculate probability P of these paths and choose the best one?


Solution

  • It can be solved by reducing the problem to shortest path problem, assuming we are indeed discussing probabilities (that means, each weight is in range [0,1]).

    Let the graph be G=(V,E), and the probability between two vertices denoted as w(u,v).

    Define: w'(u,v) = -log(w(u,v))

    The shortest path from some node s to some node t is then the shortest path in the graph when using w' as weight function. You can find the shortest path using Dijkstra's algorithm or Bellman-Ford algorithm

    Proof:

    For a path v1->v2->...->vn, its probability is w(v1,v2) * w(v2,v3) * ... * w(vn-1,vn).

    When using w' as the weight, the cost of this path when using any shortest path algorithm is:

    d(v1,vn) = w'(v1,v2) + w'(v2,v3) + ... + w'(vn-1,vn) = 
    d(v1,vn) = -log(w(v1,v2)) + -log(w(v2,v3) + ... + -log(w(vn-1,vn)) = 
    d(v1,vn) = -1* [ log(w(v1,v2)) + log(w(v2,v3)) + ... + log(w(vn-1,vn)) =
    d(v1,vn) = -1* log(w(v1,v2) * w(v2,v3) * ... * w(vn-1,vn)) 
    

    This obviously applies also for the minimal path found from s to t.
    This means that this path have minimal value of:

    s(s,t) = -1* log(w(s,v2) * w(v2,v3) * ... * w(vn-1,t)) 
    

    And since log is monotonic function, it is also minimal value of -1 * w(s,v2) * w(v2,v3) * ... * w(vn-1,t), which is the maximal value of w(s,v2) * w(v2,v3) * ... * w(vn-1,t), and that's exactly the probability.