Is there a good API
definition for the Python PDFMiner package
?
For example I can see from the source code that LTText contains x0, y0, x1, y1
and some text and there is a get_text()
method that returns the text - but is the intention to just access x0... directly?
In which case why wrap the text using _text and get_text()
?
The project isn't heavily documented, so you'll have to figure it out on your own. There is, however, some documentation in the form of basic explanations of the main classes and structure.
For your specific question, LTText
functions like an abstract base class. Some objects that inherit from LTText
override the get_text
method and do something more complicated, like LTTextContainer
:
class LTTextContainer(LTExpandableContainer, LTText):
def __init__(self):
LTText.__init__(self)
LTExpandableContainer.__init__(self)
return
def get_text(self):
return ''.join(obj.get_text() for obj in self if isinstance(obj, LTText))
Usually getter and setter methods wrap functionality that may be useful to override in subclasses or update state that depends on the input. For example, LTComponent.set_bbox
updates six other attributes besides self.bbox
.