Suppose I've created two objects of TessBaseAPI
— xapi
and yapi
— initialized by calling the following overload of Init() function:
int Init(const char * datapath,
const char * language,
OcrEngineMode oem,
char ** configs,
int configs_size,
const GenericVector< STRING > * vars_vec,
const GenericVector< STRING > * vars_values,
bool set_only_non_debug_params
);
passing exactly identical arguments.
Since the objects are initialized with identical arguments, at this point xapi
and yapi
are assumed to be identical from behavioral1 perspective. Is my assumption correct? I hope so, as I don't find any reason for the objects to be non-identical.
Now I'm going to use xapi
to extract information from an image but before that I call SetVariable() a number of times, to set few more configurations.
bool SetVariable(const char * name, const char * value);
and then I used xapi
to extract some text from an image. Once I'm done with the extraction, I did this:
xapi.Clear(); //what exactly happens here?
After the call to Clear(), can I use xapi
and yapi
interchangeably? In other words, can I assume that xapi
and yapi
are identical at this point from behavioral1 perspective? Can I say Clear()
is actually a reset functionality?
1. By "behavioral", I meant performance in terms of accuracy, not speed/latency.
Since the objects are initialized with identical arguments, at this point xapi and yapi are assumed to be identical from behavioral perspective. Is my assumption correct?
From the outset there is nothing I can find to dispute this assumption.
The following parameters are cleared or reset (if you will):
When calling Clear() the following are called:
01402 void TessBaseAPI::Clear() {
01403 if (thresholder_ != NULL)
01404 thresholder_->Clear();
01405 ClearResults();
01406 }
Calling thresholder_->Clear();
destroys the pix (if not null)
00044 // Destroy the Pix if there is one, freeing memory.
00045 void ImageThresholder::Clear() {
00046 if (pix_ != NULL) {
00047 pixDestroy(&pix_);
00048 pix_ = NULL;
00049 }
00050 image_data_ = NULL;
00051 }
For Clear Results, as shown below.
01641 void TessBaseAPI::ClearResults() {
01642 if (tesseract_ != NULL) {
01643 tesseract_->Clear();
01644 }
01645 if (page_res_ != NULL) {
01646 delete page_res_;
01647 page_res_ = NULL;
01648 }
01649 recognition_done_ = false;
01650 if (block_list_ == NULL)
01651 block_list_ = new BLOCK_LIST;
01652 else
01653 block_list_->clear();
01654 if (paragraph_models_ != NULL) {
01655 paragraph_models_->delete_data_pointers();
01656 delete paragraph_models_;
01657 paragraph_models_ = NULL;
01658 }
01659 }
The page results, block list are set to null, along with associated flags being reset.
tesseract_->Clear() releases the following:
00413 void Tesseract::Clear() {
00414 pixDestroy(&pix_binary_);
00415 pixDestroy(&cube_binary_);
00416 pixDestroy(&pix_grey_);
00417 pixDestroy(&scaled_color_);
00418 deskew_ = FCOORD(1.0f, 0.0f);
00419 reskew_ = FCOORD(1.0f, 0.0f);
00420 splitter_.Clear();
00421 scaled_factor_ = -1;
00422 ResetFeaturesHaveBeenExtracted();
00423 for (int i = 0; i < sub_langs_.size(); ++i)
00424 sub_langs_[i]->Clear();
00425 }
Noteworthy, SetVariable does not affect init values:
Only works for non-init variables (init variables should be passed to Init()).
00143 bool TessBaseAPI::SetVariable(const char* name, const char* value) {
00144 if (tesseract_ == NULL) tesseract_ = new Tesseract;
00145 return ParamUtils::SetParam(name, value, SET_PARAM_CONSTRAINT_NON_INIT_ONLY,
00146 tesseract_->params());
00147 }
After the call to Clear(), can I use xapi and yapi interchangeably?
No. Certainly not if you used a thresholder.
Can I say Clear() is actually a reset functionality?
Not in the sense of restoring it to it's initialised state. It will change some values of the original object to null. It will keep the grunt work of parameters like const char * datapath, const char * language, OcrEngineMode oem,
. It seems to be a way to free memory without obliterating the object. Inline with "without actually freeing any recognition data that would be time-consuming to reload.".
After calling Clear() call either SetImage or TesseractRect before using Recognition or Get* functions.
Clear will not dispose of the SetVariables, they will only be reset to default upon destruction of the object by calling End().
Looking at the TessbaseApi() class, you can see what you are initialising and which of these values will be reset using Clear().
00091 TessBaseAPI::TessBaseAPI()
00092 : tesseract_(NULL),
00093 osd_tesseract_(NULL),
00094 equ_detect_(NULL),
00095 // Thresholder is initialized to NULL here, but will be set before use by:
00096 // A constructor of a derived API, SetThresholder(), or
00097 // created implicitly when used in InternalSetImage.
00098 thresholder_(NULL),
00099 paragraph_models_(NULL),
00100 block_list_(NULL),
00101 page_res_(NULL),
00102 input_file_(NULL),
00103 output_file_(NULL),
00104 datapath_(NULL),
00105 language_(NULL),
00106 last_oem_requested_(OEM_DEFAULT),
00107 recognition_done_(false),
00108 truth_cb_(NULL),
00109 rect_left_(0), rect_top_(0), rect_width_(0), rect_height_(0),
00110 image_width_(0), image_height_(0) {
00111 }
Given that the base constructor for the class is:
(datapath, language, OEM_DEFAULT, NULL, 0, NULL, NULL, false);
These three parameters are always needed, which makes sense.
If the datapath, OcrEngineMode or the language have changed - start again.
Note that the language_ field stores the last requested language that was initialized successfully, while tesseract_->lang stores the language actually used. They differ only if the requested language was NULL, in which case tesseract_->lang is set to the Tesseract default ("eng").