I have this problem for a calculation I'm doing on an observed network.
Let's imagine a random graph G(N,p) where N is the number of nodes and p is the probability of an edge being formed between any node ni and nj. The graph is undirected.
Let's then mark an amount of x nodes, say 5, as special. What is then the probability (ps) of a node to have an edge to any of these special nodes.
I have disturbingly few ideas how to go about this myself. I suppose that the answer will be in two steps:
Firstly, because I imagine that I will have to conceder all possible graphs of N nodes to make events for my probability calculation. I think there might be S(S-1)/2 possible graphs if S=N(N-1)/2, but these are not equally likely, so I'm at a loss. Secondly I understand that the probability of links to special nodes must approach 1 as the number of special nodes (x) approach N, and that ps=p if x=1.
Thankful for any hints. Thanks
For a non-special node, there are x
potential edges from that node to a special node. For any such potential edge the probability of the edge not being in the graph is 1-p
. Assuming independence, the probability that it avoids all special nodes is (1-p)^x
. The complementary probability is what you seek, it is
1 - (1-p)^x
For special nodes, the probability that a given special node is connected to one of the other special nodes is
1 - (1-p)^(x-1)
You can combine these answers in various ways. Pick a node at random. The probability that it is either special or has an edge connecting it to a special node is:
x/N + (N-x)/N * [1-(1-p)^x]
the probability that it has an edge connecting to a special node is:
x/N * [1 - (1-p)^(x-1)] + (N-x)/N * [1 - (1-p)^x]
in all cases -- these tend to 1 as x tends to N.
Since this is Stack Overflow, a bit of programming is in order. Here is a Python 3 Monte Carlo simulation that seems to suggest the accuracy of the formula for the probability that a randomly selected node is either special or adjacent to a special:
import random
#The following function returns a random graph on nodes
#0,1,...,N-1 where edges are chosen with probability p
#The graph is returned as a dictionary keyed by the
#The corresponding values are sorted lists of adjacent nodes
def randGraph(N,p):
#initialize G:
G = {}
for i in range(N):
G[i] = []
#iterate through potential edges:
for i in range(N-1):
for j in range(i+1,N):
if random.random() < p:
G[i].append(j)
G[j].append(i)
#sort adjacency lists before returning:
for i in G:
G[i].sort()
return G
#A function to determine the number of nodes
#in a graph that are either
#special or connected to a special,
#where special means: 0,1,2,...,x
def specialsAndFriends(G,x):
count = 0
for i in G:
if (i<x) or (len(G[i]) > 0 and G[i][0] < x):
count +=1
return count
#now -- a Monte Carlo simulation:
def estimateProb(N,p,x,trials = 1000):
estimates = []
for i in range(trials):
G = randGraph(N,p)
estimates.append(specialsAndFriends(G,x)/N)
return sum(estimates)/trials
#compare with:
def exactProb(N,p,x):
return x/N + (N-x)/N * (1 - (1-p)**x)
(Python 2 would need to adjust e.g. x/N to float(x)/N).
Sample output:
>>> estimateProb(100,0.25,10)
0.9496800000000086
>>>
>>> exactProb(100,0.25,10)
0.9493178367614746