Why is Tesseract OCR engine using a global thresholding technique such as Otsu binarization? Aren't local thresholding techniques (e.g. Sauvola, Niblack, etc.) more effective in leaving out text from images?
Tesseract was used in Google book project and AFAIK they run tests for best binarization and Otsu was most universal. If Otsu is not best for your case you can use other binarization algorithm before sending image to tesseract.