Search code examples
cencryptionencryption-symmetric

Using libsodium XChaCha20-Poly1305 for large files


I was looking through libsodium, and in particular at the symmetric encryption option XChaCha20-Poly1305. What I can't get my head around is that libsodium appears to provide no "context/update/finalise" style of working that you commonly find in crypto libraries.

It is clear from the libsodium that there is "no practical limit" to the size of a XChaCha20-Poly1305 message. However in practical terms, if I'm encrypting a multi-GB file, I'm not quite clear as to how you would use libsodium for that ? Because obviously you would only be passing the contents of the fread buffer to crypto_aead_xchacha20poly1305_ietf_encrypt?

IMPORTANT NOTE TO THOSE WHO THINK THIS IS OFF TOPIC

After bowing to peer pressure, I did delete this post. However I have re-opened it at the request of @MaartenBodewes who felt strongly that it was on-topic, and so strongly that he put in some effort into writing an answer. Therefore out of respect for his effort, I have undeleted the post. Please, spare me more "off-topic" comments, I've read enough of them!


Solution

  • In the introduction of libsodium it reads: "Its goal is to provide all of the core operations needed to build higher-level cryptographic tools."

    Libsodium is therefore a relatively high level library that provides limited access to the underlying structures.


    That said, there are some inherent difficulties of encrypting such large files using an authenticated cipher. The problem is that you either need to first verify the authenticity and then start to decrypt or you need to decrypt online before verifying the authentication tag. That in turn means that you have to write / destroy the contents if verification fails.

    Generally you can get around that by encrypting in e.g. blocks of 16KiB or so and then add an authentication tag for the block. Of course you would need to make sure that you increase the nonce (making sure that the counter of the stream cipher doesn't repeat). This will add some overhead of course, but nothing spectacular - and you'd have some overhead anyway. The disadvantage is that you cannot decrypt in place anymore (as that would leave gaps).

    You could also store all the authentication tags at the end if you want to make a really advanced scheme. Or buffer all the authentication tags in memory and calculate a single (HMAC) tag over all the collected tags.

    So calling crypto_aead_xchacha20poly1305_ietf_encrypt multiple times could be considered an option. You may want to calculate a file specific key if you go that way so you can start your nonce at zero.


    If you just want confidentiality of the file stored you could consider leaving out the authentication tag. In that case you can manually influence the counter used to create the key stream using int crypto_stream_xchacha20_xor_ic:

    This permits direct access to any block without having to compute the previous ones.

    Obviously you can still add an authentication tag using HMAC-SHA-2 which is also available in libsodium, but this will be rather slower than using poly1305.


    Finally, libsodium is open source. If you're exceedingly brave you could go into the gory details and construct your own context/update/finalize. The algorithm certainly supports it (hint: never buffer the authentication tag or nonce during decryption routines if you go this route - directly decrypt).