What are good tests to benchmark a crypto library?
Which unit (time,CPU cycles...) should we use to compare the differents crypto libraries?
Are there any tools, procedures....?
Any Idea, comment is welcome!
Thank you for your inputs!
What are good tests to benchmark a crypto library?
The answers below are in the context of Crypto++. I don't now about other libraries, like OpenSSL, Botan, BouncyCastle, etc.
The Crypto++ library has a built-in benchmarking suite.
Which unit (time,CPU cycles...) should we use to compare the differents crypto libraries?
You typically measure performance in cycles-per-byte. Cycles-per-byte depends upon the CPU frequency. Another related metric is throughput measured in MB/s. It also depends upon CPU frequency.
Are there any tools, procedures....?
git clone https://github.com/weidai11/cryptopp.git
cd cryptopp
make static cryptest.exe
# 2.0 GHz (use KB=1024; not 1000)
make bench CRYPTOPP_CPU_SPEED=1.8626
make bench
will create a file called benchmark.html
.
If you want to manually run the tests, then:
./cryptest.exe b <time in seconds> <cpu speed in GHz>
It will output an HTML-like table without <HEAD>
and <BODY>
tags. You will still be able to view it in a web browser.
You can also check the Crypto++ benchmark page at Crypto++ Benchmarks. The information is dated, and its on our TODO list.
You also need accumen for what looks right. For example, SSE4.2 and ARMv8 have a CRC32 instruction. Cycles-per-byte should go from about 3 or 5 cpb (software only) to about 1 or 1.5 cpb (hardware acceleration). It should equate to a change of roughly 300 or 500 MB/s (software only) to roughly 1.5 GB/s (hardware acceleration) on modern hardware running around 2 GHz.
Other technologies, like SSE2 and NEON, are trickier to work with. There's a theoretical cycles-per-byte and throughput you should see, but you may not know what it is. You may need to contact the authors of the algorithm to find out. For example, we contacted the authors of BLAKE2 to learn if our ARMv7/ARMv8 NEON implementation was performing as expected because it was missing benchmark results on the author's homepage.
I've also found GCC 4.6 (and above) and -O3
can make a big difference in software-only implementations. That's because GCC heavily vectorizes at -O3
, and you might witness a 2x to 2.5x speedup.For example, the compiler may generate code that runs at 40 cpb at -O2
. At -O3
it may run at 15 or 19 cpb. A good SSE2 or NEON implementation should outperform the software-only implementation by at least a few cycles per byte. In the same example, the SSE2 or NEON implementation may run at 8 to 13 cpb.
There's also sites like OpenBenchmarking.org that may be able to provide some metrics for you.