Search code examples
cpagingmemory-addressprobability-density

How best to model a (very) sparse probability density function?


I want to write a traffic generator that replicates the primitive read and write demands that are made on memory by a running computer.

But running computers also show (very strong) locality in their memory references and across a 64 bit address space only a very small range of addresses will be referenced (in fact I have tested this on on one benchmark and about 9000 pages of the billions on offer are touched).

What is a good way to model such a sparse probability density function (in C or C++ ideally) - I have probabilities for the benchmark but don't need to follow them too closely (as I could just use the benchmark references in any case but want something a bit more flexible).

To clarify I also have data about how many reads should come from each page, but what I am interested in is picking the sequence of pages. (The Markov chain idea suggested in the comments might be the way to do this)


Solution

  • For what it's worth I decided to use a pretty crude hack - along these lines: pick a random number between 1 and 0, find the element in the distribution that has a frequency/probability equal or greater than this number (picking the minimum probability of all elements in this set). Seems to work (I did this in R)