Search code examples
networkxgraph-theoryprobabilitypath-finding

shared probability of all paths between two nodes in a weighted limited graph


I have a weighted graph G=(V,E)

In this graph the weight of edge(v[i],v[j]) is the probability(p) of a direct transition between v[i] and v[j] (0<p<1).

For exemple: graph exemple

I am trying to calculate the probability of transition between 2 nodes v, u.

What I did so far:

I found all paths from v to u (using all_simple_paths )

[v -0.1-> a -0.2-> c -0.3-> u]

[v -0.1-> a -0.2-> c -0.2-> b -0.4-> d -0.5-> u]

[v -0.1-> b -0.2-> c -0.3-> u]

[v -0.1-> b -0.4-> d -0.5-> u]

I an calculate the probability of any individual path (by multply edges), but I don't sure how to union the probability of the different paths with shared nodes. I think I can do that by union and intersection between paths, but I'm searching after a simple programable algorithm.


Solution

  • To calculate the probability of all the events in a path occurring, multiply the probabilities of the individual events.

    e.g. Two transitions of even odds will both happen 1 time in 4

    To calculate the probability of one or more of several independent events occurring p(A or B) = p(A) + p(B) – p(A and B).

    Your example:

    enter image description here

    p(D) = 0.5
    p(C) = 0.3
    p(B) = 0.2 + 0.06 - 0.012 = 0.248
    p(A) = 0.3 * 0.2 = 0.06
    p(V) = 0.0248 + 0.006 - 0.0001488 = 0.0306512
    

    The psuedocode algorithm looks like this

    - Mark all nodes 'uncalculated'
    - LOOP over nodes
        - IF node has outlinks but no inlinks
            - LOOP over all paths from node to end node
                - LOOP PN over nodes in path
                     - LOOP I over inlinks to PN
                         - IF source node on I  'uncalculated'
                             - BREAK out of PN loop
                         - SET IP(I) = I probability * source node probability
                     - ENDLOOP I 
                     - IF 0 inlinks
                         - SET PN probability = 1
                     - IF 1 inlinks
                         - SET PN probability = IP[0]
                     - IF 2 inlinks
                         - SET PN probability = IP[0] + IP[1] - IP[0] * IP[1]
                     - IF inlinks > 2
                         - ask user to refactor input
                         - STOP
                     - IF PN = end node
                         - OUTPUT end node probability
                         - STOP
    

    Here is the C++ code to do this ( it is far too complex to be coded in Python ):

    void cPathFinder::probs()
            {
                // The probabilities are stored in the link and node attribute myCost
                // Mark all node probabilities as 'not yet calculated'
                for (auto &n : myG)
                    n.second.myCost = -1;
    
                // loop over nodes
                for (auto &n : myG)
                {
                    // check for possible starting node
                    // i.e one with out edges and no in edges
                    if (!(outDegree(n.second) && (!inDegree(n.second))))
                        continue;
    
                    // iterate over all paths from starting node to target node
                    visitAllPaths(
                        node(n.second),
                        myEnd,
                        [&, this](int length) -> int
                        {
                            // loop over nodes in path
                            for (int n : myPath)
                            {
                                if (n < 0)
                                    continue;
    
                                // loop over inlinks
                                std::vector<double> vprob;
                                bool fOK = true;
                                for (const auto l : inlinks(n))
                                {
                                    double prevNodeProb = node(l.first.first).myCost;
                                    if (prevNodeProb == -1)
                                    {
                                        // the previous node probability has not been calculated yet
                                        // no need to look at any more inlinks
                                        fOK = false;
                                        break;
                                    }
                                    // store the probability contribution from this inlink
                                    // it is the product of the source node proabability and the link probability
                                    vprob.push_back(
                                        prevNodeProb * l.second->myCost);
                                }
                                // check if there is enough information
                                // to calculate the probability for this node
                                if (!fOK)
                                    break;
    
                                // all the previous nodes are calculated
                                // calculate this node's probability
                                switch (vprob.size())
                                {
                                case 0:
                                    //starting node, assume probability of 100%
                                    node(n).myCost = 1;
                                    break;
    
                                case 1:
                                    // one inlink, prob is previous node prob times link probability
                                    node(n).myCost = vprob[0];
                                    break;
    
                                case 2:
                                    // two inlinks
                                    node(n).myCost =
                                        vprob[0] + vprob[1] - vprob[0] * vprob[1];
                                    break;
                                    
                                default:
                                    /*  More then two inlinks
                                    The code does not handle this
                                    but note that multiple inlinks can alwayd be reduced to a series of
                                    nodes with 2 inlinks
                                    */
                                    throw std::runtime_error(
                                        userName(n) + " has more than 2 inlinks, please refactor input");
                                }
                            }
    
                            return 0;
                        });
                }
    
                std::stringstream ss;
                ss << "\nfinal node " << userName(myEnd) << " probability "
                   << node(myEnd).myCost << "\n";
                myResults = ss.str();
            }
    
    

    Output from sample run for your example

    input
    l u c 0.3
    l u d 0.5
    l c a 0.2
    l c b 0.2
    l d b 0.4
    l a v 0.1
    l b v 0.1
    e v
    
    path: u c a v 
    path: u c b v
    path: u d b v
    
    node probs
    c 0.3
    u 1
    d 0.5
    a 0.06
    b 0.248
    v 0.0306
    
    final node v probability 0.0306
    

    More complete documentation at https://github.com/JamesBremner/PathFinder3/wiki/Probabilities

    The complete code for this application is at https://github.com/JamesBremner/PathFinder3