math graph-theory probability-theory sna network-analysis

In a random graph: What is the probability that a node has a link to any node on a list x defined special nodes?

I have this problem for a calculation I'm doing on an observed network.

Let's imagine a random graph G(N,p) where N is the number of nodes and p is the probability of an edge being formed between any node n_i and n_j. The graph is undirected.

Let's then mark an amount of x nodes, say 5, as special. What is then the probability (p_s) of a node to have an edge to any of these special nodes.

I have disturbingly few ideas how to go about this myself. I suppose that the answer will be in two steps:

Firstly, because I imagine that I will have to conceder all possible graphs of N nodes to make events for my probability calculation. I think there might be S(S-1)/2 possible graphs if S=N(N-1)/2, but these are not equally likely, so I'm at a loss. Secondly I understand that the probability of links to special nodes must approach 1 as the number of special nodes (x) approach N, and that p_s=p if x=1.

Thankful for any hints. Thanks

Solution

For a non-special node, there are x potential edges from that node to a special node. For any such potential edge the probability of the edge not being in the graph is 1-p. Assuming independence, the probability that it avoids all special nodes is (1-p)^x. The complementary probability is what you seek, it is

1 - (1-p)^x

For special nodes, the probability that a given special node is connected to one of the other special nodes is

1 - (1-p)^(x-1)

You can combine these answers in various ways. Pick a node at random. The probability that it is either special or has an edge connecting it to a special node is:

x/N + (N-x)/N * [1-(1-p)^x]

the probability that it has an edge connecting to a special node is:

x/N * [1 - (1-p)^(x-1)] + (N-x)/N * [1 - (1-p)^x]

in all cases -- these tend to 1 as x tends to N.

Since this is Stack Overflow, a bit of programming is in order. Here is a Python 3 Monte Carlo simulation that seems to suggest the accuracy of the formula for the probability that a randomly selected node is either special or adjacent to a special:

import random

#The following function returns a random graph on nodes
#0,1,...,N-1 where edges are chosen with probability p
#The graph is returned as a dictionary keyed by the 
#The corresponding values are sorted lists of adjacent nodes

def randGraph(N,p):

    #initialize G:
    G = {}
    for i in range(N):
        G[i] = []

    #iterate through potential edges:
    for i in range(N-1):
        for j in range(i+1,N):
            if random.random() < p:
                G[i].append(j)
                G[j].append(i)

    #sort adjacency lists before returning:
    for i in G:
        G[i].sort()
    return G

#A function to determine the number of nodes
#in a graph that are either
#special or connected to a special,
#where special means: 0,1,2,...,x

def specialsAndFriends(G,x):
    count = 0
    for i in G:
        if (i<x) or (len(G[i]) > 0 and G[i][0] < x):
            count +=1
    return count

#now -- a Monte Carlo simulation:

def estimateProb(N,p,x,trials = 1000):
    estimates = []
    for i in range(trials):
        G = randGraph(N,p)
        estimates.append(specialsAndFriends(G,x)/N)
    return sum(estimates)/trials

#compare with:

def exactProb(N,p,x):
    return x/N + (N-x)/N * (1 - (1-p)**x)

(Python 2 would need to adjust e.g. x/N to float(x)/N).

Sample output:

>>> estimateProb(100,0.25,10)
0.9496800000000086
>>> 
>>> exactProb(100,0.25,10)
0.9493178367614746