Search code examples
pythonencryptionrsapython-3.4pycrypto

Decrypting Large files with RSA in pycrypto?


I have been using pycrypto module for encryption and decryption with RSA key pair and algorithm. The problem is when I try encrypting large files (10kB of text file) I take the block size of 32 byte when reading the file and encrypting it

>>> f = open('10kb','rb')
>>> p = open('enc','wb')
>>> while True:
       data = f.read(32)
       if not data:
           break
       enc_data = public_key.encrypt(data,32)
       p.write(enc_data[0])
    p.close()
    f.close()

It gives the output:

128
128
.......and the many 128 blocks it is writing

When I try to decrypt the encrypted file, I need to read it with 128 byte block so as to give back 32 byte blocks,

>>> f = open('enc','rb')
>>> p = open('dec','wb')
>>> while True:
       data = f.read(128)
       if not data:
           break
       dec_data = private_key.decrypt(data)
       p.write(dec_data)
    p.close()
    f.close()

It is giving the output:

32
32
.....so many 32 byte blocks it is decrypting, then
128
128
128
128
Traceback (most recent call last):
  File "<pyshell#251>", line 5, in <module>
     enc_data = private_key.decrypt(data)
  File "/usr/lib/python3/dist-packages/Crypto/PublicKey/RSA.py", line 174,   in decrypt
     return pubkey.pubkey.decrypt(self, ciphertext)
  File "/usr/lib/python3/dist-packages/Crypto/PublicKey/pubkey.py", line 93, in decrypt
      plaintext=self._decrypt(ciphertext)
  File "/usr/lib/python3/dist-packages/Crypto/PublicKey/RSA.py", line 237, in _decrypt
      cp = self.key._blind(ciphertext, r)
  ValueError: Message too large

To the point where it is outputting the block size of 32, it is decrypting right, but where it starts with 128, its messing up. Why it is saying Message size too large ? Is there any better and fast way to decrypt large text files using pycrypto module ?


Solution

  • Partial answer coming along ...


    RSA works on numbers. You only get bytes out of it when you serialize those long integers. Since those numbers don't have a fixed size, they are serialized with as much bytes as are necessary, but not more.

    An RSA encryption c = me mod n can result in ciphertexts, which are so much smaller than n, that not all the bytes are filled, because leading zeros of the the number don't have to be serialized.

    Sometimes (depending on modulus and plaintext) it may happen that you're writing a 127 byte chunk instead of a 128 byte chunk during encryption, but you're always reading a 128 byte chunk during decryption. That means, you're taking away one byte from the next chunk. When the alignment breaks, you can run into various random behaviors such as a chunk being larger than the modulus and therefore not a valid ciphertext.

    There are two ways to solve that:

    1. Always write the length of the ciphertext chunk before it.

      Encryption:

      data = f.read(readsize)
      if not data:
          break
      i += 1
      enc_data = public_key.encrypt(data, 32)[0]
      
      p.write(chr(len(enc_data)))
      p.write(enc_data)
      

      Decryption:

      length = f.read(1)
      if not length:
          break
      data = f.read(ord(length))
      print(length, len(data))
      j += 1
      dec_data = private_key.decrypt(data)
      p.write(dec_data[:readsize])
      

      At the end you have to reduce the ciphertext to the original plaintext size, because you're working without PKCS#1 v1.5 padding or OAEP.

    2. Pad the zero bytes that are missing during encryption.

      Encryption:

      data = f.read(readsize)
      if not data:
          break
      i += 1
      enc_data = public_key.encrypt(data, 32)[0]
      
      while len(enc_data) < writesize:
          enc_data = "\x00" + enc_data
      p.write(enc_data)
      

      Decryption:

      data = f.read(writesize)
      if not data:
          break
      j += 1
      dec_data = private_key.decrypt(data)
      p.write(dec_data[:readsize])
      

    Note that readsize = 127 and writesize = 128. Here are the full source codes for both variants.


    Now, this is a partial answer, because this still leads to corrupt files, which are also too short, but at least it fixes the OP's error.