Search code examples
pythonpdfpyquery

pdfquery/PyQuery: example code shows no AttributeError but mine does...why?


I'm following the example code found here. The author has some documentation where he list some steps that used to write the program. When I run the whole program together it runs perfectly but when I follow the steps he's put I get an AttributeError.

Here's my code

pdf = pdfquery.PDFQuery("Aberdeen_2015_1735t.pdf")
pdf.load()
pdf.tree.write("test3.xml", pretty_print=True, encoding="utf-8")

sept = pdf.pq('LTPage[pageid=\'1\'] LTTextLineHorizontal:contains("SEPTEMBER")')
print(sept.text())

x = float(sept.get('x0'))
y = float(sept.get('y0'))
cells = pdf.extract( [
     ('with_parent','LTPage[pageid=\'1\']'),
     ('cells', 'LTTextLineHorizontal:in_bbox("%s,%s,%s,%s")' % (x, y, x+600, y+20))
])

Everything runs fine until it gets to "sept.get" where it says that "'PyQuery' object has no attribute 'get'." Does anyone know why the program wouldn't encounter this error when it's run all together but it occurs when a piece of the code is run?


Solution

  • According to the PyQuery API reference, a PyQuery object indeed doesn't have a get member. The code example must be obsolete.

    According to https://pypi.python.org/pypi/pdfquery, attributes are retrieved with .attr:

    x = float(sept.attr('x0'))
    

    Judging by the history of pyquery's README.rst, get was never documented and only worked due to some side effect (some delegation to a dict, perhaps).