Search code examples
encryptionzippkzip

What is the correct processing of deriving a key for PKZIP Encryption and Decryption?


The PKZIP stream cipher is a symmetric encryption scheme, where the secret key is required for both encryption and decryption. This key is used to produce a key stream of bytes for the one-time pad algorithm. The stream cipher has a 96-bit internal state split into three 32-bit values denoted key0, key1 and key2.

That is according to this

But According to PKWARE APPNOTE

6.1.3 Each encrypted file has an extra 12 bytes stored at the start 
of the data area defining the encryption header for that file.  The
encryption header is originally set to random values, and then
itself encrypted, using three, 32-bit keys.  The key values are
initialized using the supplied encryption password.  After each byte
is encrypted, the keys are then updated using pseudo-random number
generation techniques in combination with the same CRC-32 algorithm
used in PKZIP and described elsewhere in this document.

6.1.4 The following are the basic steps required to decrypt a file:

1) Initialize the three 32-bit keys with the password.
2) Read and decrypt the 12-byte encryption header, further
   initializing the encryption keys.
3) Read and decrypt the compressed data stream using the
   encryption keys.

Initializing the encryption keys:

Key(0) <- 305419896
Key(1) <- 591751049
Key(2) <- 878082192

loop for i <- 0 to length(password)-1
    update_keys(password(i))
end loop

Where update_keys() is defined as:

update_keys(char):
  Key(0) <- crc32(key(0),char)
  Key(1) <- Key(1) + (Key(0) & 000000ffH)
  Key(1) <- Key(1) * 134775813 + 1
  Key(2) <- crc32(key(2),key(1) >> 24)
end update_keys

The confusing part is where and how do we get Keys(key0,key1,key2)?

And below is my sample data for testing the above mentioned steps 1 and 2.

password:

1234

12 bytes encrypted header:

3A CE 1D 8D E4 D1 ED D1 E5 08 4F EC

crc:

E07B8FC3

Now what is the correct way of getting the three 32-bit keys?

I need to understand this part in order to code my zip recovery software.


Solution

  • First off, you should know that this ancient PKWare encryption scheme is very weak, and should not be used for something that you actually care remains encrypted against attack. You should instead use AES-256, whose use is also described in that PKWare APPNOTE.

    Second, you are showing in your question the complete algorithm. The three 32-bit keys are generated using the password and those steps. The first three lines are the initial values of the three keys, if that's what you're asking. You continue to use update_keys(), along with decrypt_byte(), to decrypt the 12-byte header, and then for the compressed data. When encrypting, the 12-byte header is generated randomly, except for the last byte which should be the high byte of the CRC of the file. This provides a quick check, albeit with a 1/256 false positive rate, that the correct password was provided.