Search code examples
pythonpython-3.xjupyter-notebookwrapperpdf-parsing

Can only use wrapper function a single time after definition then getting NameError


Background

I'm using pdfquery to scrap data from pdfs. Like this one. This questions builds off my earlier question here.

I have successfully been able to use custom wrapper functions that can take arguments as seen in this answer. Except for the following which is giving me trouble when I try to run it multiple times in jupyter notebook;

Cell 1

import pdfquery

def load_file(PDF_FILE):
    pdf = pdfquery.PDFQuery(PDF_FILE)
    pdf.load()
    return pdf

file_with_table = 'path_to_the_file_mentioned_above.pdf'
pdf = load_file(file_with_table)

Cell 2

def in_range(prop, bounds):
    def wrapped(*args, **kwargs):
        n = float(this.get(prop, 0))
        return bounds[0] <= n <= bounds[1]
    return wrapped

def is_element(element_type):
    def wrapped(*args, **kwargs):
        return this.tag in element_type
    return wrapped

def str_len(condition):
    def wrapped(*args, **kwargs):
        cond = ''.join([str(len(this.text)),condition])
        return eval(cond)
    return wrapped

Cell 3

x_check = in_range('x0', (97, 160))
y_check = in_range('y0', (250, 450))
el_check = is_element(['LTTextLineHorizontal', 'LTTextBoxHorizontal'])
str_len = str_len('>0')

els = pdf.pq('LTPage[page_index="0"] *').filter(el_check)
els = els.filter(str_len)
els = els.filter(x_check)
els = els.filter(y_check)

[(i.text) for i in els]

The function, str_len, will work fine if it is run a single time after definition;

No error when running the third cell pictured

enter image description here

but throws a NameError when I try to run the function a second time;

NameError after running third cell a second time.

error after running cell 2nd time

Here is the text of the NameError

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-27-54cd329bb1e1> in <module>()
      2 y_check = in_range('y0', (250, 450))
      3 el_check = is_element(['LTTextLineHorizontal', 'LTTextBoxHorizontal'])
----> 4 str_len = str_len('>0')
      5 
      6 els = pdf.pq('LTPage[page_index="0"] *').filter(el_check)

<ipython-input-25-654bff7d0eed> in wrapped(*args, **kwargs)
     12 def str_len(condition):
     13     def wrapped(*args, **kwargs):
---> 14         return eval(''.join([str(len(this.text)),condition]))
     15     return wrapped

NameError: name 'this' is not defined 

Questions

Why can I only use this function once after it's definition?

Is there anyway that I can circumvent this problem?


Solution

  • Function names are variables like any other; there isn't a separate namespace for functions. str_len = str_len('>0') rebinds the name str_len to the return value of the call to the original value of str_len. After this line, you no longer have a reference to the function. Use a different name for the computed length:

    new_name = str_len('>0')