Search code examples
pythonencryptionpycrypto

python encrypt big file


This script is xor encrypt function, if encrypt small file, is good ,but I have tried to open encrypt a big file (about 5GB) error information:

"OverflowError: size does not fit in an int" ,and open too slow.

Anyone can help me optimization my script,thank you.

from Crypto.Cipher import XOR
import base64
import os
def encrypt():
enpath = "D:\\Software"
key = 'vinson'
for files in os.listdir(enpath):
    os.chdir(enpath)  
    with open(files,'rb') as r:
        print ("open success",files)
        data = r.read()
        print ("loading success",files)
        r.close()
        cipher = XOR.new(key)
        encoding = base64.b64encode(cipher.encrypt(data))
        with open(files,'wb+') as n:
            n.write(encoding)
            n.close()

Solution

  • To expand upon my comment: you don't want to read the file into memory all at once, but process it in smaller blocks.

    With any production-grade cipher (which XOR is definitely not) you would need to also deal with padding the output file if the source data is not a multiple of the cipher's block size. This script does not deal with that, hence the assertion about the block size.

    Also, we're no longer irreversibly (well, aside from the fact that the XOR cipher is actually directly reversible) overwriting files with their encrypted versions. (Should you want to do that, it'd be better to just add code to remove the original, then rename the encrypted file into its place. That way you won't end up with a half-written, half-encrypted file.)

    Also, I removed the useless Base64 encoding.

    But – don't use this code for anything serious. Please don't. Friends don't friends roll their own crypto.

    from Crypto.Cipher import XOR
    import os
    
    
    def encrypt_file(cipher, source_file, dest_file):
        # this toy script is unable to deal with padding issues,
        # so we must have a cipher that doesn't require it:
        assert cipher.block_size == 1
    
        while True:
            src_data = source_file.read(1048576)  # 1 megabyte at a time
            if not src_data:  # ran out of data?
                break
            encrypted_data = cipher.encrypt(src_data)
            dest_file.write(encrypted_data)
    
    
    def insecurely_encrypt_directory(enpath, key):
        for filename in os.listdir(enpath):
            file_path = os.path.join(enpath, filename)
            dest_path = file_path + ".encrypted"
            with open(file_path, "rb") as source_file, open(dest_path, "wb") as dest_file:
                cipher = XOR.new(key)
                encrypt_file(cipher, source_file, dest_file)