Search code examples
mathgraph-theoryprobability-theorysnanetwork-analysis

In a random graph: What is the probability that a node has a link to any node on a list x defined special nodes?


I have this problem for a calculation I'm doing on an observed network.

Let's imagine a random graph G(N,p) where N is the number of nodes and p is the probability of an edge being formed between any node ni and nj. The graph is undirected.

Let's then mark an amount of x nodes, say 5, as special. What is then the probability (ps) of a node to have an edge to any of these special nodes.

I have disturbingly few ideas how to go about this myself. I suppose that the answer will be in two steps:

Firstly, because I imagine that I will have to conceder all possible graphs of N nodes to make events for my probability calculation. I think there might be S(S-1)/2 possible graphs if S=N(N-1)/2, but these are not equally likely, so I'm at a loss. Secondly I understand that the probability of links to special nodes must approach 1 as the number of special nodes (x) approach N, and that ps=p if x=1.

Thankful for any hints. Thanks


Solution

  • For a non-special node, there are x potential edges from that node to a special node. For any such potential edge the probability of the edge not being in the graph is 1-p. Assuming independence, the probability that it avoids all special nodes is (1-p)^x. The complementary probability is what you seek, it is

    1 - (1-p)^x
    

    For special nodes, the probability that a given special node is connected to one of the other special nodes is

    1 - (1-p)^(x-1)
    

    You can combine these answers in various ways. Pick a node at random. The probability that it is either special or has an edge connecting it to a special node is:

    x/N + (N-x)/N * [1-(1-p)^x]
    

    the probability that it has an edge connecting to a special node is:

    x/N * [1 - (1-p)^(x-1)] + (N-x)/N * [1 - (1-p)^x]
    

    in all cases -- these tend to 1 as x tends to N.

    Since this is Stack Overflow, a bit of programming is in order. Here is a Python 3 Monte Carlo simulation that seems to suggest the accuracy of the formula for the probability that a randomly selected node is either special or adjacent to a special:

    import random
    
    #The following function returns a random graph on nodes
    #0,1,...,N-1 where edges are chosen with probability p
    #The graph is returned as a dictionary keyed by the 
    #The corresponding values are sorted lists of adjacent nodes
    
    def randGraph(N,p):
    
        #initialize G:
        G = {}
        for i in range(N):
            G[i] = []
    
        #iterate through potential edges:
        for i in range(N-1):
            for j in range(i+1,N):
                if random.random() < p:
                    G[i].append(j)
                    G[j].append(i)
    
        #sort adjacency lists before returning:
        for i in G:
            G[i].sort()
        return G
    
    #A function to determine the number of nodes
    #in a graph that are either
    #special or connected to a special,
    #where special means: 0,1,2,...,x
    
    def specialsAndFriends(G,x):
        count = 0
        for i in G:
            if (i<x) or (len(G[i]) > 0 and G[i][0] < x):
                count +=1
        return count
    
    #now -- a Monte Carlo simulation:
    
    def estimateProb(N,p,x,trials = 1000):
        estimates = []
        for i in range(trials):
            G = randGraph(N,p)
            estimates.append(specialsAndFriends(G,x)/N)
        return sum(estimates)/trials
    
    #compare with:
    
    def exactProb(N,p,x):
        return x/N + (N-x)/N * (1 - (1-p)**x)
    

    (Python 2 would need to adjust e.g. x/N to float(x)/N).

    Sample output:

    >>> estimateProb(100,0.25,10)
    0.9496800000000086
    >>> 
    >>> exactProb(100,0.25,10)
    0.9493178367614746