Search code examples
pythonpdfpoppler

Has anyone been able to use poppler new_from_data in python?


Using Python3, and Poppler, I can load files with new_from_file without problem, but new_from_data is problematic. Here is the code which is obviously a simple test, because it does not make sense to read from file and then use new_from_data, since new_from_file works perfectly, but I could not post here the full code generating the pdf file.

from gi.repository import Poppler, Gtk

def draw(widget, cr):
        # set background.
        cr.set_source_rgb(0.7, 0.6, 0.5)
        cr.paint()

        # set page background
        cr.set_source_rgb(1, 1, 1)
        cr.rectangle(0,0,800,400)

        cr.fill()
        page.render(cr)

filepath = "d:/Mes Documents/A5.pdf" 
f11 = open(filepath, "r", encoding = "cp850")
data1 = f11.read()
f11.close()

document = Poppler.Document.new_from_data(data1, len(data1),  None)
page = document.get_page(0)
print (document.get_n_pages())


window = Gtk.Window(title="Hello World")
window.connect("delete-event", Gtk.main_quit)
window.connect("draw", draw)
window.set_app_paintable(True)

window.show_all()
Gtk.main()

Four different situations may happen :

  • With a very simple pdf (the "Hello world" example in Pdf Reference 13), it works.
  • With a normal file, there may be no error, but get_n_pages returns 0, and get_page(0) returns None
  • Or I may get an error : GLib.Error: poppler-quark: PDF document is damaged (4)
  • Or the program crashs

I wonder if the problem may be with the encoding parameter, but I tried everything I thought of without result. I tried with "rb" and then converting bytes array to string with :

data1 = "".join(map(data1))

No result.

Search on Google never returned a working example


Solution

  • I ran into the same problem, solved it using Gio.MemoryInputStream. Not really elegant but it works...

    from gi.repository import Poppler, Gtk, Gio
    
    def draw(widget, cr):
            # set background.
            cr.set_source_rgb(0.7, 0.6, 0.5)
            cr.paint()
    
            # set page background
            cr.set_source_rgb(1, 1, 1)
            cr.rectangle(0,0,800,400)
    
            cr.fill()
            page.render(cr)
    
    filepath = "d:/Mes Documents/A5.pdf" 
    with open(filepath, "rb") as f11:
        input_stream = Gio.MemoryInputStream.new_from_data(f11.read())
        # Take care that you need to call .close() on the Gio.MemoryInputStream once you're done with your pdf document.
    
    document = Poppler.Document.new_from_stream(input_stream, -1, None, None)
    page = document.get_page(0)
    print (document.get_n_pages())
    
    
    window = Gtk.Window(title="Hello World")
    window.connect("delete-event", Gtk.main_quit)
    window.connect("draw", draw)
    window.set_app_paintable(True)
    
    window.show_all()
    Gtk.main()