I am looking into the BoringSSL library in order to understand the implementation of AES:
https://github.com/google/boringssl/tree/master/crypto/fipsmodule/aes
In this folder the aes.c
file is responsible for encrypting / decrypting etc. It usually takes a 128bit input (and key) and provides an encrypted / decrypted output. It checks if any hardware-optimization is possible and eventually runs the right assembler code for the AES-encryption/decryption.
There is also the file mode_wrappers.c
which provides the different modes (CBC, CTR, etc.). But at which point is the input divided in the 128bit blocks that are encrypted/decrypted with the aes.c
file?
Are the CTR and the ECB-Mode parallelized in this library?
The part where the message input is split into blocks is only one #include
away: you can see this happening in the modes
source folder, e.g. this one for CTR mode.
Note that BoringSSL is not meant as a generic crypto API:
Although BoringSSL is an open source project, it is not intended for general use, as OpenSSL is.
In general AES isn't parallelized, even if the mode allows for parallelization. This is basically because "native" AES implementations are easily fast enough to handle any I/O. It can clearly be seen from the file indicated for CTR that no parallelization takes place. Remember that TLS records are relatively small, so splitting and recombining them would take too much time anyway: instead you'd want low latency packet encryption.
So for TLS it is fine to encrypt entire data records using single threaded AES. If there are multiple TLS sessions then the different sessions will likely use different threads / cores, so parallelization basically comes "for free".
In cases where the CPU usage may become an issue like SSD's the AES is usually provided by the hardware.