I built a Django application for a client about a year ago. He has now resold the application to some super secret government agency that they won't even tell me the name of.
Part of the application dynamically generates PDF files using the python library xhtml2pdf (pisa). The Government doesn't like this library, they won't tell me why, they said I have to use HTMLDOC for pdf generation.
There's not much documentation on this library, but from reading the PHP example, it looks like you can just communicate with it through the shell, so it should work with Python. However, I'm having a hard time passing the html to HTMLDOC. It looks like HTMLDOC will only accept a file, but I really need to pass the html as a string since it's dynamically generated. (Or write the html string to a temporary file and then pass that temporary file to HTMLDOC).
I thought StringIO would work for this, but I'm getting an error. Here's what I have:
def render_to_pdf(template_src, context_dict):
template = get_template(template_src)
context = Context(context_dict)
html = template.render(context)
result = StringIO.StringIO(html.encode("utf-8"))
os.putenv("HTMLDOC_NOCGI", "1")
#this line throws "[Errno 2] No such file or directory"
htmldoc = subprocess.Popen("htmldoc -t pdf --quiet '%s'" % result, stdout=subprocess.PIPE).communicate()
pdf = htmldoc[0]
result.close()
return HttpResponse(pdf, mimetype='application/pdf')
Any ideas, tips, or help would be really appreciated.
Thanks.
UPDATE
Stack Trace:
Environment:
Request Method: GET
Request URL: (redacted)
Django Version: 1.3 alpha 1 SVN-14921
Python Version: 2.6.5
Installed Applications:
['django.contrib.auth',
'django.contrib.contenttypes',
'django.contrib.sessions',
'django.contrib.sites',
'django.contrib.messages',
'django.contrib.admin',
'application']
Installed Middleware:
('django.middleware.common.CommonMiddleware',
'django.contrib.sessions.middleware.SessionMiddleware',
'django.middleware.csrf.CsrfViewMiddleware',
'django.contrib.auth.middleware.AuthenticationMiddleware',
'django.contrib.messages.middleware.MessageMiddleware')
Traceback:
File "/usr/local/lib/python2.6/dist-packages/django/core/handlers/base.py" in get_response
111. response = callback(request, *callback_args, **callback_kwargs)
File "/usr/local/lib/python2.6/dist-packages/django/contrib/auth/decorators.py" in _wrapped_view
23. return view_func(request, *args, **kwargs)
File "/home/ascgov/application/views/pdf.py" in application_pdf
90. 'user':owner})
File "/home/ascgov/application/views/pdf.py" in render_to_pdf
53. htmldoc = subprocess.Popen("/usr/bin/htmldoc -t pdf --quiet '%s'" % result, stdout=subprocess.PIPE).communicate()
File "/usr/lib/python2.6/subprocess.py" in __init__
633. errread, errwrite)
File "/usr/lib/python2.6/subprocess.py" in _execute_child
1139. raise child_exception
Exception Type: OSError at /pdf/application/feed-filtr/
Exception Value: [Errno 2] No such file or directory
First, subprocess.Popen
's first arg should generally be a list (unless you also pass shell=True
). The No such file or directory
is almost certainly caused by the absence of a file named "htmldoc -t pdf --quiet '...
on the system (it's trying to find and run the program named for the whole string value).
Second, if you give htmldoc some html on its stdin, it'll spit out a pdf on its stdout, thus avoiding the need for a temporary file.
Give this a try (untested):
htmldoc = subprocess.Popen(
['/usr/bin/htmldoc', '-t', 'pdf', '--webpage', '-'],
stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.PIPE
)
stdout, stderr = htmldoc.communicate(html)
NB: substitute /usr/bin/htmldoc
for the real path to htmldoc on your system.
The -
argument to the htmldoc program, tells it to read from stdin. You'll pass your html string value (html
) to htmldoc's stdin
as an argument to the htmldoc.communicate
call. The resulting pdf output should be available in stdout
, and any other messages or stats in stderr
.
Edit: The documentation does seem a bit wonky, but there is quite a bit of it. You might have better luck with the html in one page or pdf versions, or the man page.
Also, be sure to pass a string, or similar, to the stdin of the htmldoc process. Passing a StringIO object directly, as was implied by my previous code snippet, won't work.