Search code examples
encryptionprofilingbouncycastle

Code profiling to improve performance : see CPU cycles inside mscorlib.dll?


I made a small test benchmark comparing .NET's System.Security.Cryptography AES implementation vs BouncyCastle.Org's AES.

Link to GitHub code: https://github.com/sidshetye/BouncyBench

I'm particularly interested in AES-GCM since it's a 'better' crypto algorithm and .NET is missing it. What I noticed was that while the AES implementations are very comparable between .NET an BouncyCastle, the GCM performance is quite poor (see extra background below for more). I suspect it's due to many buffer copies or something. To look deeper, I tried profiling the code (VS2012 => Analyze menu bar option => Launch performance wizard) and noticed that there was a LOT of CPU burn inside mscorlib.dll

enter image description here

Question: How can I figure out what's eating most of the CPU in such a case? Right now all I know is "some lines/calls in Init() burn 47% of CPU inside mscorlib.ni.dll" - but without knowing what specific lines, I don't know where to (try and) optimize. Any clues?

Extra background:

Based on the "The Galois/Counter Mode of Operation (GCM)" paper by David A. McGrew, I read "Multiplication in a binary field can use a variety of time-memory tradeoffs. It can be implemented with no key-dependent memory, in which case it will generally run several times slower than AES. Implementations that are willing to sacrifice modest amounts of memory can easily realize speeds greater than that of AES."

If you look at the results, the basic AES-CBC engine performances are very comparable. AES-GCM adds the GCM and reuses the AES engine beneath it in CTR mode (faster than CBC). However, GCM also adds multiplication in the GF(2^128) field in addition to the CTR mode, so there could be other areas of slowdown. Anyway, that's why I tried profiling the code.

For the interested, where is my quick test performance benchmark. It's inside a Windows 8 VM and YMMV. The test is configurable but currently it's to simulate crypto overhead in encrypting many cells of a database (=> many but small plaintext input)

Creating initial random bytes ...
Benchmark test is : Encrypt=>Decrypt 10 bytes 100 times

Name               time (ms)    plain(bytes) encypted(bytes)   byte overhead

.NET ciphers
AES128                1.5969              10              32      220 %
AES256                1.4131              10              32      220 %
AES128-HMACSHA256     2.5834              10              64      540 %
AES256-HMACSHA256     2.6029              10              64      540 %

BouncyCastle Ciphers
AES128/CBC            1.3691              10              32      220 %
AES256/CBC            1.5798              10              32      220 %
AES128-GCM           26.5225              10              42      320 %
AES256-GCM           26.3741              10              42      320 %

R - Rerun tests
C - Change size(10) and iterations(100)
Q - Quit

Solution

  • This is a rather lame move from Microsoft as they obviously broke a feature that worked well before Windows 8, but no longer, as explained in this MSDN blog post: :

    On Windows 8 the profiler uses a different underlying technology than what it does on previous versions of Windows, which is why the behavior is different on Windows 8. With the new technology, the profiler needs the symbol file (PDB) to know what function is currently executing inside NGEN’d images.

    (...)

    It is however on our backlog to implement in the next version of Visual Studio.

    The post gives directions to generate the PDB files yourself (thanks!).