Tesseract uses leptonica load images on which to do OCR:
#include <tesseract/baseapi.h>
#include <leptonica/allheaders.h>
int main() {
tesseract::TessBaseAPI *api = new tesseract::TessBaseAPI();
// Initialize tesseract-ocr with English, without specifying tessdata path
if (api->Init(NULL, "eng")) {
fprintf(stderr, "Could not initialize tesseract.\n");
exit(1);
}
// Open input image with leptonica library
Pix *image = pixRead("./test1dld.png");
api->SetImage(image);
...
However, for reading in a batch of tests, the easy way would be to use a document feeder on a copier and have the machine email the resulting single pdf file where each page is a bitmap. The leptonica documentation mentions converting to pdf, but I can't find how to read from pdf at all, much less a page at a time.
Can anyone point me to an API call that lets me view a bitmap pdf file one by one as individual bitmaps? Preferably a c API not a shell command.
Leptonica is an image reader - not document (pdf) reader (yes it can create pdf, but reading pdf is a different story).
You will need another library to extract images from pdf. For python I would suggest to try pymudpf, for C++ you can check poppler, qpdf. For C I am not sure if there is (free) solution.