I am creating an encryption scheme with AES in cbc mode with a 256-bit key. Before I learned about CBC mode and initial values, I was planning on creating a 32-bit salt for each act of encryption and storing the salt. The password/entered key would then be padded with this salt up to 32 bits.
ie. if the pass/key entered was "tree," instead of padding it with 28 0s, it would be padded with the first 28 chars of this salt.
However, this was before I learned of the iv, also called a salt in some places. The question for me has now arisen as to whether or not this earlier method of salting has become redundant in principle with the IV. This would be to assume that the salt and the iv would be stored with the cipher text and so a theoretical brute force attack would not be deterred any.
Storing this key and using it rather than 0s is a step that involves some effort, so it is worth asking I think whether or not it is a practically useless measure. It is not as though there could be made, with current knowledge, any brute-force decryption tables for AES, and even a 16 bit salt pains the creation of md5 tables.
Thanks, Elijah
It's good that you know CBC, as it is certainly better than using ECB mode encryption (although even better modes such as the authenticated modes GCM and EAX exist as well).
The following answer will explain how to handle the password, salt, key and IV. It is however recommended to find a higher level protocol and library that implements password based encryption and even better to avoid password based encryption at all.
IV's & salts are often confused, though they are separate terms. In the question, there is confusion about bits & bytes, key- & block size and rainbow- & MD5 tables. It is important to understand the terms before implementing cryptographic protocols.
Keys and passwords are not the same. Normally you create a key used for symmetric encryption out of a password using a (password based) key derivation function or PBKDF. The most common one discussed here is PBKDF2 (password based key derivation function #2), which is used for PBE (password based encryption). This is defined in the latest, open PKCS#5 standard by RSA labs. Before entering the password need to check if the password is correctly translated into bytes (character encoding). Other PBKDF's are bcrypt, scrypt and Argon2; the latter two allow for specifying the memory usage so that they cannot be as easily attacked by specialized hardware.
The salt is used as another input of the key derivation function. It is used to prevent brute force attacks using "rainbow tables" where keys are pre-computed for specific passwords. Because of the salt, the attacker cannot use pre-computed values, as he cannot generate one for each salt. The salt should normally be 8 bytes (64 bits) or longer; using a 128 bit salt would give you optimum security. The salt also ensures that identical passwords (of different users) do not derive the same key.
The output of the key derivation function is a secret of dkLen
bytes, where dkLen
is the length of the key to generate, in bytes. As an AES key does not contain anything other than these bytes, the AES key will be identical to the generated secret. dkLen
should be 16, 24 or 32 bytes for the key lengths of AES: 128, 192 or 256 bits.
OK, so now you finally have an AES key to use. However, if you simply encrypt each plain text block with this key, you will get identical results for matching plain text blocks. CBC mode gets around this by XOR'ing the next plain text block with the last encrypted block before doing the encryption. That last encrypted block is the "vector". This does not work for the first block, because in that case there is no last encrypted block. This is why you need to specify the first vector: the "initialization vector" or IV.
The block size of AES is 128 bits / 16 bytes independent of the key size. So the vectors, including the initialization vector, need to be 16 bytes as well. Now, if you only use the key to encrypt e.g. a single file, then the IV could simply contain 16 bytes with the value 00
h, or any other 16-byte constant value.
The use of a constant IV does not work for multiple files, because if the files contain the same text, you will be able to detect that the first part of the encrypted file is identical. This is why you need to specify a different IV for each encryption you perform with the key. For CBC-mode the IV needs to be fully unpredictable to an adversary. Most commonly the IV is generated using a cryptographically secure random generator which is normally included with cryptographic API's or the language runtime classes. Of course that means that the other party also needs to learn about the fresh IV; for this reason the IV is commonly prefixed to the ciphertext.
Now there is one trick that might allow you to use all zero's for the IV all the time: for each plain text you encrypt using AES-CBC, you could calculate a key using the same password but a different salt, generating a fresh key for each file. In that case, you will only use the resulting key for a single piece of information. If a higher level API only allows you to provide a salt then altering the salt is the way to go.
Sometimes the additional output of the PBKDF to derive the IV. This way the official recommendation - that the IV for CBC should not be predicted by an adversary - is fulfilled. You should however make sure that you do not ask for more output of the PBKDF2 function than that the underlying hash function can deliver. PBKDF2 has weaknesses that would enable an adversary to gain an advantage in such a situation.
So do not ask for more than 256 bits if SHA-256 is used as hash function for PBKDF2. Note that SHA-1 - with an output of 160 bits - is the common default for PBKDF2 so that only allows for a single 128 bit AES key, leaving no room for extracting an IV for CBC-mode.