Search code examples
clinuxmemory-managementhpc

When is a program limited by the memory bandwidth?


I want to know if a program that I am using and which requires a lot of memory is limited by the memory bandwidth.

When do you expect this to happen? Did it ever happen to you in a real-life scenario?

I found several articles discussing this issue, including:

The first link is a bit old, but suggests that you need to perform less than about 1-40 floating point operations per floating point variable in order to see this effect (correct me if I'm wrong).

How can I measure the memory bandwidth that a given program is using and how do I measure the (peak) bandwidth that my system can offer?

I don't want to discuss any complicated cache issues here. I'm only interested in the communication between the CPU and the memory.


Solution

  • To benchmark your system's memory performance try the STREAM benchmark. Study the benchmark tasks and the results you get carefully since they provide the basic data about your memory that you need to do anything further. You need to figure out the effect(s) of cache(s) -- you do have to understand them -- and when the bandwidth hits a peak.

    To figure out the memory performance of your program:

    1. Measure the execution time for a range of problem sizes.
    2. Calculate, by hand, how much data your program reads and writes from and to memory for the same range of problem sizes.
    3. Divide memory use by time.

    WARNING: this is an crude approach and should only be used to figure out if you ought to pay attention to memory bandwidth issues. If your crude figuring tells you that your program uses less than 50% of the available memory bandwidth (the figures you got from then STREAM benchmark) then you shouldn't give it any more thought.

    This crude approach works best when your program manipulates relatively few very large data structures with simple access patterns. This does describe a lot of high-performance scientific programs but perhaps not a lot of other types of program.

    If your program is using virtual memory or if it is doing I/O as it executes, then memory bandwidth is not a problem, not until you sort out disk bandwidth that is.

    Finally, yes, every time I run one of our scientific codes the speed of execution is limited by memory bandwidth. As a rule of thumb, if a code executes 10% of the FLOPS that the processor specification promises I'm happy.