python numpy statistics distribution poisson

random.expovariate(rate) and numpy.random.poisson(quantity) yield the same average value, but the distributions are vastly different. Why is this?

I'm making some modifications to the load testing framework that we're using throughout the company, and this is a question for which I would love to have an answer.

I was under the impression that the following 2 approaches to generating a Poisson distribution would be equivalent, but I'm clearly wrong:

#!/usr/bin/env python                                                                            

from numpy import average, random, std
from random import expovariate

def main():

    for count in 5.0, 50.0:
        data = [random.poisson(count) for i in range(10000)]
        print 'npy_poisson average with count=%d: ' % count, average(data)
        print 'npy_poisson std_dev with count=%d: ' % count, std(data)

        rate = 1 / count
        data = [expovariate(rate) for i in range(10000)]
        print 'expovariate average with count=%d: ' % count, average(data)
        print 'expovariate std_dev with count=%d: ' % count, std(data)

if __name__ == '__main__':
    main()

This results in output that looks like:

npy_poisson average with count=5:   5.0168
npy_poisson std_dev with count=5:   2.23685443424
expovariate average with count=5:   4.94383067075
expovariate std_dev with count=5:   4.95058985422
npy_poisson average with count=50:  49.9584
npy_poisson std_dev with count=50:  7.07829565927
expovariate average with count=50:  50.9617389096
expovariate std_dev with count=50:  51.6823970228

Why does the standard deviation when I use the built in random.expovariate scale proportionately with number of events in a given interval, while the expovariate std_deviation scales at a rate of log base 10 (count)??

Follow up question: Which one is more appropriate if you're simulating the frequency with which users interact with your service?

Solution

Because your assumptions are wrong. The mean / variance of a Poisson distribution are both lambda, hence the stdev is sqrt(lambda). The mean / variance of an exponential distribution are 1/lambda and 1/lambda^2 respectively. So std = sqrt(1/(1/rate)^2) = sqrt(rate^2) = rate which is exactly what you are seeing here.

I'd suggest reading the Wikipedia article on queuing theory for your follow up question.