Let's say I have am creating a webapp, where users can create a nested tree of strings (with sensitive information). These strings are presumably quite short. I want to encrypt both keys and values in this tree before saving it. All values in the tree will be encrypted client-side using a symmetric key supplied by the user. Likewise they will be decrypted client-side, when reading.
The tree is persisted in a Mongo database.
I can't decide whether I should serialize the tree and encrypt it has a whole string or whether to encrypt values individually, considering that all data in the tree will be encrypted using the same key.
What are the pros and cons of either?
From what I can tell, AES uses a block size of 128 bits, meaning that any string can grow up to 15 characters in length when encoded, which speaks in favor of encoding a serialized string (if you want to avoid overhead)
Note: Although the webapp will use both HTTPS, IP whitelisting and multifactor authentication, I want to make an effort to prevent data breach in the event the Mongo database is stolen. That's what I'm going for here. Advice or thoughts on how to accomplish this is appreciated.
Furthermore, I also want my service to inspire trust. Sending data in the clear (although over HTTPS) means the user must trust me to encrypt it before persisting it. Encrypting client-side allows me to emphasize that I don't know (or need to know) what I'm saving.
I can't think of a reason why these approaches would be different in terms of security of the actual strings (assuming they are both implemented correctly). Encrypting the strings individually obviously means that the structure of the tree will not be secret, but I'm not sure if you are concerned with that or not. For example, if you encrypt each string individually, someone seeing the ciphertexts could find out how many keys there are in the tree, and he could also learn something about the length of each key and value. If you encrypt the tree as a whole serialized blob, then someone seeing the ciphertext can tell roughly how much data is in the tree but nothing about the lengths or number of individual keys/values.
In terms of overhead, the padding would be a consideration, as you mentioned. A bigger source of storage overhead is IVs: if you are using a block cipher mode such as CTR, you need to use a distinct IV for each ciphertext. This means if you are encrypting each string individually, you need to store an IV for each string. If you encrypt the whole serialized tree, then you just need to store the one IV for that one ciphertext.
Before you implement this in Javascript, though, you should make sure that you're actually getting a real improvement in security from doing client-side encryption. This article is a classic: http://www.matasano.com/articles/javascript-cryptography/ One important point is to remember that the server is providing the Javascript encryption code, so encrypting data on the client doesn't protect it from the server. If your main concern is a stolen database, you could achieve the same security by just encrypting the data on the server before inserting it in the database.