I am working on extracting PDFs from SEC filings. They usually come like this:
For whatever reason when I save the raw PDF to a .text file, and then try to run
uudecode -o output_file.pdf input_file.txt
from the python subprocess.call()
function or any other python function that allows commands to be executed from the command line, the PDF files that are generated are corrupted. If I run this same command from the command line directly there is no corruption.
When taking a closer look at the PDF file being output from the python script, it looks like the file ends prematurely. Is there some sort of output limit when executing a command line command from python?
Thanks!
This script worked fine for me running under Python 3.4.1 on Fedora 21 x86_64 with uudecode 4.15.2:
import subprocess
subprocess.call("uudecode -o output_file.pdf input_file.txt", shell=True)
Using the linked SEC filing (length: 173,141 B; sha1: e4f7fa2cbb3422411c2f2968d954d6bb9808b884
), the decoded PDF (length: 124,557 B; sha1: 1676320e1d9923e14d19451c16688198bc93ca0d
) appears correct when viewed.
There may be something else in your environment causing the problem. You may want to add additional details to your question.
Is there some sort of output limit when executing a command line command from python?
If by "output limit" you mean the size of the file being written by uudecode
, then no. The only type of "output limit" you need to worry about when using the subprocess
module is when you pass stdout=PIPE
or stderr=PIPE
when creating a child process. If the child process writes enough data to either of these streams, and your script does not regularly drain them, the child process will block (see the subprocess
module documentation). In my test, uudecode
wrote nothing to stdout
or stderr
.