I'm working on a pdf decrypt task.
The pdf provider/vendor encrypts each stream data by their own method. And they provide decrypt function as well.
I verified that the decrypt function works correctly on one stream data. But the pdf file could have many streams, so I need to extract each stream data and feed it to the decrypt function.
Below is one of the stream from the decrypted pdf file:
6 0 obj
<</Length 608/Filter[/VendorPDFEncrypt/FlateDecode]>>stream
data_1
endstream
endobj
And the pdf vendor provides the decrypted pdf file to me, so I find the corresponding stream in it, as below. As you can see, vendor added filter disappears and the data part changes.
2 0 obj
<</Filter[/FlateDecode]/Length 598>>stream
data_2
endstream
endobj
Summay process:
encrypted pdf file -> extract each stream data -> feed it to decrypt function repeatedly-> get a readable pdf file
My question is how to extract each stream data from the pdf file? So I can use the decrypt function to handle each stream data.
You can extract the PDF file to JSON like this, including the streams of compressed data:
cpdf -output-json in.pdf -o out.json
Then, when you have processed the JSON to decompress the stream, you can check by roundtripping back to PDF:
cpdf -j new.json -o out.pdf