I know the theory but have problem with practical implementation. I wrote an AES algorithm in C. Now, I would like to know, how many cycles per byte it "has". I know that I have to (is that 100% rigth?):
- Calculate speed of an algorithm in bytes per second
- Get clock speed in hertz
- Divide speed of an algorithm in bytes per second by clock speed in hertz
- Take the reciprocal from 3.
- Measure speed of an algorithm in gigabytes per second
- Divide speed of an algorithm in gigabytes per second by the clock speed in gigahertz
- Take the reciprocal from 6.
Is it possible to do it in C/C++? How to make it and what should I use/look for to make it?
Im interested in Linux/Windows/Mac solutions.
This is just algebra, not an equation or a theory.
If you already know bytes/second, and clock speed (cycles/second), then
(bytes/second) / (cycles/second) => bytes/cycle
1 / (bytes/cycle) => cycles/byte
If you don't know bytes per second, you can calculate it by:
- get a high-resolution timestamp T1 suitable for this kind of measurement
- run your algorithm N times over B bytes
- get another timestamp T2
- subtract the timestamps one from the other, to give the elapsed time E = T2 - T1
- you have now processed (N *B) bytes in E time units
- repeat several times
- if your measurements are unstable, or your duration E uncomfortably close to zero, or suspiciously close to some system timer granularity, increase N and/or B and try again. Actually, do this a few times anyway to confirm you get a linear relationship between bytes processed and time taken
- scale your time units (nanoseconds, microseconds, whatever they are) into seconds, if that's how you want to display the result
Note that if your "timestamp" above is actually a cycle counter, you can skip the cycles/second stage. Otherwise, you can just read off the CPU frequency from the system/hardware information tool for your platform.
For POSIX, a sensible timer might be clock_gettime(CLOCK_THREAD_CPUTIME_ID,...)
, for example. You should be able to find example code for rdtsc
, documentation for the best Windows timing function etc. by searching.
As for actually taking the measurements, there are good suggestions in the comments. You need to:
- take a large (enough) number of samples for it to be reliable
- ideally with nothing else contending for resources, if not with FIFO/realtime scheduling
- either making sure any CPU clock scaling is turned off, or discard the first samples where it was warming up