I am trying to extract a password protected .zip which has a .txt document (Say Congrats.txt
for this case). Now Congrats.txt
has text in it thus its not 0kb in size. Its placed in a .zip (For the sake of the thread lets name this .zip zipv1.zip
) with the password dominique
for the sake of this thread. That password is stored among other words and names within another .txt (Which we'll name it as file.txt
for the sake of this question).
Now if I run the code below by doing python Program.py -z zipv1.zip -f file.txt
(Assuming all these files are in the same folder as Program.py
) my program displays dominique
as the correct password for the zipv1.zip
among the other words/passwords in file.txt
and extracts the zipv1.zip
but the Congrats.txt
is empty and has the size of 0kb.
Now my code is as follows:
import argparse
import multiprocessing
import zipfile
parser = argparse.ArgumentParser(description="Unzips a password protected .zip", usage="Program.py -z zip.zip -f file.txt")
# Creates -z arg
parser.add_argument("-z", "--zip", metavar="", required=True, help="Location and the name of the .zip file.")
# Creates -f arg
parser.add_argument("-f", "--file", metavar="", required=True, help="Location and the name of file.txt.")
args = parser.parse_args()
def extract_zip(zip_filename, password):
try:
zip_file = zipfile.ZipFile(zip_filename)
zip_file.extractall(pwd=password)
print(f"[+] Password for the .zip: {password.decode('utf-8')} \n")
except:
# If a password fails, it moves to the next password without notifying the user. If all passwords fail, it will print nothing in the command prompt.
pass
def main(zip, file):
if (zip == None) | (file == None):
# If the args are not used, it displays how to use them to the user.
print(parser.usage)
exit(0)
# Opens the word list/password list/dictionary in "read binary" mode.
txt_file = open(file, "rb")
# Allows 8 instances of Python to be ran simultaneously.
with multiprocessing.Pool(8) as pool:
# "starmap" expands the tuples as 2 separate arguments to fit "extract_zip"
pool.starmap(extract_zip, [(zip, line.strip()) for line in txt_file])
if __name__ == '__main__':
main(args.zip, args.file)
However if I another zip (zipv2.zip
) with the same method as zipv1.zip
with only difference being Congrats.txt
is in a folder which the folder is zipped alongside Congrats.txt
I do get the same results as zipv1.zip
but this time Congrats.txt
extracted along the folder it was in, and Congrats.txt
was intact; the text in it and the size of it was intact.
So to solve this I tried reading zipfile's documentation where I found out that if a password doesn't match the .zip it throws a RuntimeError
. So I did changed except:
in the code to except RuntimeError:
and got this error when trying to unzip zipv1.zip
:
(venv) C:\Users\USER\Documents\Jetbrains\PyCharm\Program>Program.py -z zipv1.zip -f file.txt
[+] Password for the .zip: dominique
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "C:\Users\USER\AppData\Local\Programs\Python\Python37\lib\multiprocessing\pool.py", line 121, in worker
result = (True, func(*args, **kwds))
File "C:\Users\USER\AppData\Local\Programs\Python\Python37\lib\multiprocessing\pool.py", line 47, in starmapstar
return list(itertools.starmap(args[0], args[1]))
File "C:\Users\USER\Documents\Jetbrains\PyCharm\Program\Program.py", line 16, in extract_zip
zip_file.extractall(pwd=password)
File "C:\Users\USER\AppData\Local\Programs\Python\Python37\lib\zipfile.py", line 1594, in extractall
self._extract_member(zipinfo, path, pwd)
File "C:\Users\USER\AppData\Local\Programs\Python\Python37\lib\zipfile.py", line 1649, in _extract_member
shutil.copyfileobj(source, target)
File "C:\Users\USER\AppData\Local\Programs\Python\Python37\lib\shutil.py", line 79, in copyfileobj
buf = fsrc.read(length)
File "C:\Users\USER\AppData\Local\Programs\Python\Python37\lib\zipfile.py", line 876, in read
data = self._read1(n)
File "C:\Users\USER\AppData\Local\Programs\Python\Python37\lib\zipfile.py", line 966, in _read1
self._update_crc(data)
File "C:\Users\USER\AppData\Local\Programs\Python\Python37\lib\zipfile.py", line 894, in _update_crc
raise BadZipFile("Bad CRC-32 for file %r" % self.name)
zipfile.BadZipFile: Bad CRC-32 for file 'Congrats.txt'
"""
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "C:\Users\USER\Documents\Jetbrains\PyCharm\Program\Program.py", line 38, in <module>
main(args.zip, args.file)
File "C:\Users\USER\Documents\Jetbrains\PyCharm\Program\Program.py", line 33, in main
pool.starmap(extract_zip, [(zip, line.strip()) for line in txt_file])
File "C:\Users\USER\AppData\Local\Programs\Python\Python37\lib\multiprocessing\pool.py", line 276, in starmap
return self._map_async(func, iterable, starmapstar, chunksize).get()
File "C:\Users\USER\AppData\Local\Programs\Python\Python37\lib\multiprocessing\pool.py", line 657, in get
raise self._value
zipfile.BadZipFile: Bad CRC-32 for file 'Congrats.txt'
The same results happpen though; password was found in file.txt
, zipv1.zip
was extracted but Congrats.txt
was empty and 0kb in size. So I ran the program again, but for zipv2.zip
this time and got this as a result:
(venv) C:\Users\USER\Documents\Jetbrains\PyCharm\Program>Program.py -z zipv2.zip -f file.txt
[+] Password for the .zip: dominique
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "C:\Users\USER\AppData\Local\Programs\Python\Python37\lib\multiprocessing\pool.py", line 121, in worker
result = (True, func(*args, **kwds))
File "C:\Users\USER\AppData\Local\Programs\Python\Python37\lib\multiprocessing\pool.py", line 47, in starmapstar
return list(itertools.starmap(args[0], args[1]))
File "C:\Users\USER\Documents\Jetbrains\PyCharm\Program\Program.py", line 16, in extract_zip
zip_file.extractall(pwd=password)
File "C:\Users\USER\AppData\Local\Programs\Python\Python37\lib\zipfile.py", line 1594, in extractall
self._extract_member(zipinfo, path, pwd)
File "C:\Users\USER\AppData\Local\Programs\Python\Python37\lib\zipfile.py", line 1649, in _extract_member
shutil.copyfileobj(source, target)
File "C:\Users\USER\AppData\Local\Programs\Python\Python37\lib\shutil.py", line 79, in copyfileobj
buf = fsrc.read(length)
File "C:\Users\USER\AppData\Local\Programs\Python\Python37\lib\zipfile.py", line 876, in read
data = self._read1(n)
File "C:\Users\USER\AppData\Local\Programs\Python\Python37\lib\zipfile.py", line 966, in _read1
self._update_crc(data)
File "C:\Users\USER\AppData\Local\Programs\Python\Python37\lib\zipfile.py", line 894, in _update_crc
raise BadZipFile("Bad CRC-32 for file %r" % self.name)
zipfile.BadZipFile: Bad CRC-32 for file 'Congrats.txt'
"""
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "C:\Users\USER\Documents\Jetbrains\PyCharm\Program\Program.py", line 38, in <module>
main(args.zip, args.file)
File "C:\Users\USER\Documents\Jetbrains\PyCharm\Program\Program.py", line 33, in main
pool.starmap(extract_zip, [(zip, line.strip()) for line in txt_file])
File "C:\Users\USER\AppData\Local\Programs\Python\Python37\lib\multiprocessing\pool.py", line 276, in starmap
return self._map_async(func, iterable, starmapstar, chunksize).get()
File "C:\Users\USER\AppData\Local\Programs\Python\Python37\lib\multiprocessing\pool.py", line 657, in get
raise self._value
zipfile.BadZipFile: Bad CRC-32 for file 'Congrats.txt'
Again, same results; where the folder was extracted successfully and Congrats.txt
was also extracted with the text inside it and the size of it was intact.
I did take a look at this similar thread, as well as this thread but they were no help. I also checked zipfile's documentation but it wasn't helpful regarding the issue.
Now after implementing with zipfile.ZipFile(zip_filename, 'r') as zip_file:
for some unknown and weird reason; the program can read/process a small word list/password list/dictionary but can't if its large(?).
What I mean by that is that say a .txt document is present in zipv1.zip
; named Congrats.txt
with the text You have cracked the .zip!
. The same .txt is present in zipv2.zip
aswell, but this time placed in a folder named ZIP Contents
then zipped/password protected. The password is dominique
for both of the zips.
Do note that each .zip was generated using Deflate
compression method and ZipCrypto
encryption in 7zip.
Now that password is in Line 35
(35/52 lines)John The Ripper Jr.txt
and in Line 1968
for John The Ripper.txt
(1968/3106 lines).
Now if you do python Program.py -z zipv1 -f "John The Ripper Jr.txt"
in your CMD (or IDE of your choice); it will create a folder named Extracted
and place Congrats.txt
with the sentence we previously set. Same goes for zipv2
but Congrats.txt
will be in ZIP Contents
folder which is inside the Extracted
folder. No trouble extracting the .zips in this instance.
But if you try the same thing with John The Ripper.txt
i.e python Program.py -z zipv1 -f "John The Ripper.txt"
in your CMD (or IDE of your choice) it will create the Extracted
folder both of the zips; just like John The Ripper Jr.txt
but this time Congrats.txt
will be empty for both of them for some unknown reason.
My code and all necessary files are as follows:
import argparse
import multiprocessing
import zipfile
parser = argparse.ArgumentParser(description="Unzips a password protected .zip by performing a brute-force attack.", usage="Program.py -z zip.zip -f file.txt")
# Creates -z arg
parser.add_argument("-z", "--zip", metavar="", required=True, help="Location and the name of the .zip file.")
# Creates -f arg
parser.add_argument("-f", "--file", metavar="", required=True, help="Location and the name of the word list/password list/dictionary.")
args = parser.parse_args()
def extract_zip(zip_filename, password):
try:
with zipfile.ZipFile(zip_filename, 'r') as zip_file:
zip_file.extractall('Extracted', pwd=password)
print(f"[+] Password for the .zip: {password.decode('utf-8')} \n")
except:
# If a password fails, it moves to the next password without notifying the user. If all passwords fail, it will print nothing in the command prompt.
pass
def main(zip, file):
if (zip == None) | (file == None):
# If the args are not used, it displays how to use them to the user.
print(parser.usage)
exit(0)
# Opens the word list/password list/dictionary in "read binary" mode.
txt_file = open(file, "rb")
# Allows 8 instances of Python to be ran simultaneously.
with multiprocessing.Pool(8) as pool:
# "starmap" expands the tuples as 2 separate arguments to fit "extract_zip"
pool.starmap(extract_zip, [(zip, line.strip()) for line in txt_file])
if __name__ == '__main__':
# Program.py - z zipname.zip -f filename.txt
main(args.zip, args.file)
I am unsure why this is happening and cannot find an answer for this issue anywhere. Its totally unknown from what I can tell and I can't find a way to debug or solve this issue.
This continues to occur regardless of different word/password lists. Tried generating more .zips with the same Congrats.txt
but with different passwords from different word lists/password lists/dictionaries. Same method; a larger and smaller version of the .txt was used and same results as above were achieved.
BUT I did find out that if I cut out the first 2k words in John The Ripper.txt
and make a new .txt; say John The Ripper v2.txt
; the .zip is extracted successfully, Extracted
folder appears and Congrats.txt
is present with the text inside it. So I believe it has to do with the lines after the password is at. So in this case Line 1968
; where the script doesn't stop after Line 1968
? I am not sure why does this work though. It isn't a solution but a step towards the solution I guess...
So I tried using a "pool terminating" code:
import argparse
import multiprocessing
import zipfile
parser = argparse.ArgumentParser(description="Unzips a password protected .zip by performing a brute-force attack using", usage="Program.py -z zip.zip -f file.txt")
# Creates -z arg
parser.add_argument("-z", "--zip", metavar="", required=True, help="Location and the name of the .zip file.")
# Creates -f arg
parser.add_argument("-f", "--file", metavar="", required=True, help="Location and the name of the word list/password list/dictionary.")
args = parser.parse_args()
def extract_zip(zip_filename, password, queue):
try:
with zipfile.ZipFile(zip_filename, "r") as zip_file:
zip_file.extractall('Extracted', pwd=password)
print(f"[+] Password for the .zip: {password.decode('utf-8')} \n")
queue.put("Done") # Signal success
except:
# If a password fails, it moves to the next password without notifying the user. If all passwords fail, it will print nothing in the command prompt.
pass
def main(zip, file):
if (zip == None) | (file == None):
print(parser.usage) # If the args are not used, it displays how to use them to the user.
exit(0)
# Opens the word list/password list/dictionary in "read binary" mode.
txt_file = open(file, "rb")
# Create a Queue
manager = multiprocessing.Manager()
queue = manager.Queue()
with multiprocessing.Pool(8) as pool: # Allows 8 instances of Python to be ran simultaneously.
pool.starmap_async(extract_zip, [(zip, line.strip(), queue) for line in txt_file]) # "starmap" expands the tuples as 2 separate arguments to fit "extract_zip"
pool.close()
queue.get(True) # Wait for a process to signal success
pool.terminate() # Terminate the pool
pool.join()
if __name__ == '__main__':
main(args.zip, args.file) # Program.py -z zip.zip -f file.txt.
Now if I use this both zips are extracted successfully, just like the previous instances. BUT this time zipv1.zip
's Congrats.txt
is intact; has the message inside it. But the same thing cannot be said regarding zipv2.zip
as its still empty.
Sorry for the long pause ... It seems you've got yourself into a bit of a pickle.
Recap:
Working on a password protected .zip file
Brute force (ciobaneste) is attempted, using passwords from a file
The correct password is in the (previous step) file, but in spite of that, some files aren't properly extracted
The scenario is complex (quite far away from an MCVE, I'd say), there are many things that can be blamed for the behavior.
Starting with the zipv1.zip / zipv2.zip mismatch. On a closer look, it appears that, zipv2 is messed up as well. If things are easy to spot for zipv1 (Congrats.txt being the only file), for zipv2, "ZIP Contents/Black-Large.png" is being 0 sized.
It is reproducible with any file, and more: it applies to 1st entry (which is not a dir) returned by zf.namelist.
So, things start to get a little bit clearer:
File contents is being unpacked, due to dominique being present in the password file (don't know what happens til that point)
At a later point, the .zip's 1st entry is truncated to 0 bytes
Looking at the exceptions thrown when attempting to extract files using a wrong password, there are 3 types (out of which the last 2 can be grouped together):
RuntimeError: Bad password for file ...
Others:
zlib.error: Error -3 while decompressing data ...
zipfile.BadZipFile: Bad CRC-32 for file ...
I created an archive file of my own. For consistency's sake, I'll be using it from now on, but everything would apply to any other file as well.
Content:
DummyFile0.zip (10 bytes) - containing: 0123456789
DummyFile1.zip (10 bytes) - containing: 0000000000
DummyFile2.zip (10 bytes) - containing: AAAAAAAAAA
Archived the 3 files with Total Commander (v9.21a) internal Zip packer, password protecting it with dominique (zip2.0 encryption). The resulting archive (named it arc0.zip (but name is not relevant)), is 392 bytes long
code00.py:
#!/usr/bin/env python
import os
import sys
import zipfile
def main(*argv):
arc_name = argv[0] if argv else "./arc0.zip"
pwds = (
#b"dominique",
#b"dickhead",
b"coco",
)
#pwds = [item.strip() for item in open("orig/John The Ripper.txt.orig", "rb").readlines()]
print("Unpacking (password protected: dominique) {:s},"
" using a list of predefined passwords ...".format(arc_name))
if not os.path.isfile(arc_name):
raise SystemExit("Archive file must exist!\nExiting.")
faulty_pwds = list()
good_pwds = list()
with zipfile.ZipFile(arc_name, "r") as zip_file:
print("Zip names: {:}\n".format(zip_file.namelist()))
for idx, pwd in enumerate(pwds):
try:
zip_file.extractall("Extracted", pwd=pwd)
except:
exc_cls, exc_inst, exc_tb = sys.exc_info()
if exc_cls != RuntimeError:
print("Exception caught when using password ({:d}): [{:}] ".format(idx, pwd))
print(" {:}: {:}".format(exc_cls, exc_inst))
faulty_pwds.append(pwd)
else:
print("Success using password ({:d}): [{:}] ".format(idx, pwd))
good_pwds.append(pwd)
input()
print("\nFaulty passwords: {:}\nGood passwords: {:}".format(faulty_pwds, good_pwds))
if __name__ == "__main__":
print("Python {:s} {:03d}bit on {:s}\n".format(" ".join(elem.strip() for elem in sys.version.split("\n")),
64 if sys.maxsize > 0x100000000 else 32, sys.platform))
rc = main(*sys.argv[1:])
print("\nDone.\n")
sys.exit(rc)
Output:
[cfati@CFATI-5510-0:e:\Work\Dev\StackOverflow\q054532010]> "e:\Work\Dev\VEnvs\py_064_03.06.08_test0\Scripts\python.exe" ./code00.py ./arc0.zip Python 3.6.8 (tags/v3.6.8:3c6b436a57, Dec 24 2018, 00:16:47) [MSC v.1916 64 bit (AMD64)] 064bit on win32 Unpacking (password protected: dominique) arc0.zip, using a list of predefined passwords ... Zip names: ['DummyFile0.txt', 'DummyFile1.txt', 'DummyFile2.txt'] Exception caught when using password (1189): [b'mariah'] <class 'zlib.error'>: Error -3 while decompressing data: invalid code lengths set Exception caught when using password (1446): [b'zebra'] <class 'zlib.error'>: Error -3 while decompressing data: invalid block type Exception caught when using password (1477): [b'1977'] <class 'zlib.error'>: Error -3 while decompressing data: invalid block type Success using password (1967): [b'dominique'] Exception caught when using password (2122): [b'hank'] <class 'zlib.error'>: Error -3 while decompressing data: invalid code lengths set Exception caught when using password (2694): [b'solomon'] <class 'zlib.error'>: Error -3 while decompressing data: invalid distance code Exception caught when using password (2768): [b'target'] <class 'zlib.error'>: Error -3 while decompressing data: invalid block type Exception caught when using password (2816): [b'trish'] <class 'zlib.error'>: Error -3 while decompressing data: invalid code lengths set Exception caught when using password (2989): [b'coco'] <class 'zlib.error'>: Error -3 while decompressing data: invalid stored block lengths Faulty passwords: [b'mariah', b'zebra', b'1977', b'hank', b'solomon', b'target', b'trish', b'coco'] Good passwords: [b'dominique'] Done.
Looking at ZipFile.extractall code, it tries to extract all the members. The 1st raises an exception, so it starts to be clearer why it behaves the way it does. But why the behavioral difference, when attempting to extract items using 2 wrong passwords?
As seen in the tracebacks of the 2 different thrown exception types, the answer lies somewhere at the end of ZipFile.open.
After more investigations, it turns out it's because of a
According to [UT.CS]: dmitri-report-f15-16.pdf - Password-based encryption in ZIP files ((last) emphasis is mine):
3.1 Traditional PKWARE encryption
The original encryption scheme, commonly referred to as the PKZIP cipher, was designed by Roger Schaffely [1]. In [5] Biham and Kocher showed that the cipher is weak and demonstrated an attack requiring 13 bytes of plaintext. Further attacks have been developed, some of which require no user provided plaintext at all [6]. The PKZIP cipher is essentially a stream cipher, i.e. input is encrypted by generating a pseudo- random key stream and XOR-ing it with the plaintext. The internal state of the cipher consists of three 32-bit words: key0, key1 and key2. These are initialized to 0x12345678, 0x23456789 and 0x34567890, respectively. A core step of the algorithm involves updating the three keys using a single byte of input...
...
Before encrypting a file in the archive, 12 random bytes are first prepended to its compressed contents and the resulting bytestream is then encrypted. Upon decryption, the first 12 bytes need to be discarded. According to the specification, this is done in order to render a plaintext attack on the data ineffective. The specification also states that out of the 12 prepended bytes, only the first 11 are actually random, the last byte is equal to the high order byte of the CRC-32 of the uncompressed contents of the file. This gives the ability to quickly verify whether a given password is correct by comparing the last byte of the decrypted 12 byte header to the high order byte of the actual CRC-32 value that is included in the local file header. This can be done before decrypting the rest of the file.
Other references:
The algorithm weakness: due to the fact that differentiation is done on one byte only, for 256 different (and carefully chosen) wrong passwords, there will be one (at least) that will generate the same number as the correct password.
The algorithm discards most of the wrong passwords, but there are some that it doesn't.
Going back: when a file is attempted to be extracted using a password:
If the "hash" computed on the file cipher's last byte is different than file CRC's high order byte, an exception is thrown
But, if they are equal:
A new file stream is open for writing (emptying the file if already existing)
The decompression is attempted:
As seen from the output above, for my (.zip) file there are 8 passwords that mess it up. Note that:
For each archive file the result differs
The member file name and content are relevant (at least for the 1st one). Changing any of those will yield different results (for the "same" archive file)
Here's a test based on data from my .zip file:
>>> import zipfile >>> >>> zd_coco = zipfile._ZipDecrypter(b"coco") >>> zd_dominique = zipfile._ZipDecrypter(b"dominique") >>> zd_other = zipfile._ZipDecrypter(b"other") >>> cipher = b'\xd1\x86y ^\xd77gRzZ\xee' # Member (1st) file cipher: 12 bytes starting from archive offset 44 >>> >>> crc = 2793719750 # Member (1st) file CRC - archive bytes: 14 - 17 >>> hex(crc) '0xa684c7c6' >>> for zd in (zd_coco, zd_dominique, zd_other): ... print(zd, [hex(zd(c)) for c in cipher]) ... <zipfile._ZipDecrypter object at 0x0000021E8DA2E0F0> ['0x1f', '0x58', '0x89', '0x29', '0x89', '0xe', '0x32', '0xe7', '0x2', '0x31', '0x70', '0xa6'] <zipfile._ZipDecrypter object at 0x0000021E8DA2E160> ['0xa8', '0x3f', '0xa2', '0x56', '0x4c', '0x37', '0xbb', '0x60', '0xd3', '0x5e', '0x84', '0xa6'] <zipfile._ZipDecrypter object at 0x0000021E8DA2E128> ['0xeb', '0x64', '0x36', '0xa3', '0xca', '0x46', '0x17', '0x1a', '0xfb', '0x6d', '0x6c', '0x4e'] >>> # As seen, the last element of the first 2 arrays (coco and dominique) is 0xA6 (166), which is the same as the first byte of the CRC
I did some tests with other unpacking engines (with default arguments):
WinRar: for a wrong password the file is untouched, but for a faulty one it is truncated (same as here)
7-Zip: It asks the user whether to overwrite the file, and it ovewrites it regardless of the decompression result
Total Commander's internal (Zip) unpacker: same as #2.
I see this as a ZipFile bug. Specifying such a faulty (and wrong) password shouldn't overwrite the existing file (if any). Or at least, behavior should be consistent (for all wrong passwords)
A quick browse didn't reveal any bug on Python
I don't see an easy fix, as:
The Zip algorithm can't be improved (to better check whether a password is OK)
I thought of a couple of fixes, but they will either negatively impact performance or could introduce regressions in some (corner) cases
I've submitted [GitHub]: python/cpython - [3.6] bpo-36247: zipfile - extract truncates (existing) file when bad password provided (zip encryption weakness) which was closed for branch 3.6 (which is in security fixes only mode). Not sure what its outcome it's going to be (in other branches), but anyway, it won't be available anytime soon (in the next months, let's say).
As an alternative, you could download the patch, and apply the changes locally. Check [SO]: Run / Debug a Django application's UnitTests from the mouse right click context menu in PyCharm Community Edition? (@CristiFati's answer) (Patching UTRunner section) for how to apply patches on Win (basically, every line that starts with one "+" sign goes in, and every line that starts with one "-" sign goes out).
You could copy zipfile.py from Python's dir to your project (or some "personal") dir and patch that file, if you want to keep your Python installation pristine.