Search code examples
c++ccpucpu-speed

Getting cycles per byte for my algorithm?


I know the theory but have problem with practical implementation. I wrote an AES algorithm in C. Now, I would like to know, how many cycles per byte it "has". I know that I have to (is that 100% rigth?):

  1. Calculate speed of an algorithm in bytes per second
  2. Get clock speed in hertz
  3. Divide speed of an algorithm in bytes per second by clock speed in hertz
  4. Take the reciprocal from 3.
  5. Measure speed of an algorithm in gigabytes per second
  6. Divide speed of an algorithm in gigabytes per second by the clock speed in gigahertz
  7. Take the reciprocal from 6.

Is it possible to do it in C/C++? How to make it and what should I use/look for to make it?

Im interested in Linux/Windows/Mac solutions.


Solution

  • This is just algebra, not an equation or a theory.

    If you already know bytes/second, and clock speed (cycles/second), then

    (bytes/second) / (cycles/second) => bytes/cycle
    1 / (bytes/cycle) => cycles/byte
    

    If you don't know bytes per second, you can calculate it by:

    1. get a high-resolution timestamp T1 suitable for this kind of measurement
    2. run your algorithm N times over B bytes
    3. get another timestamp T2
    4. subtract the timestamps one from the other, to give the elapsed time E = T2 - T1
    5. you have now processed (N *B) bytes in E time units
    6. repeat several times
    7. if your measurements are unstable, or your duration E uncomfortably close to zero, or suspiciously close to some system timer granularity, increase N and/or B and try again. Actually, do this a few times anyway to confirm you get a linear relationship between bytes processed and time taken
    8. scale your time units (nanoseconds, microseconds, whatever they are) into seconds, if that's how you want to display the result

    Note that if your "timestamp" above is actually a cycle counter, you can skip the cycles/second stage. Otherwise, you can just read off the CPU frequency from the system/hardware information tool for your platform.

    For POSIX, a sensible timer might be clock_gettime(CLOCK_THREAD_CPUTIME_ID,...), for example. You should be able to find example code for rdtsc, documentation for the best Windows timing function etc. by searching.


    As for actually taking the measurements, there are good suggestions in the comments. You need to:

    • take a large (enough) number of samples for it to be reliable
    • ideally with nothing else contending for resources, if not with FIFO/realtime scheduling
    • either making sure any CPU clock scaling is turned off, or discard the first samples where it was warming up