Search code examples
paddle-paddle

How does PaddleOCR performance compare to Tesseract?


I recently came across PaddleOCR and am wondering, how this OCR system compares to Tesseract. Is there any data or benchmarks available?


Solution

  • I found a comparison between PaddleOCR 2 and Tesseract 4, but only for English texts. Briefly summarized:

    1. PaddleOCR is slightly slower than Tesseract on CPUs, but with GPU support it beats Tesseract by 46% on a standard-GPU.
    2. Without post-processing, PaddleOCR mainly makes mistakes with missing white spaces between words and punctuation symbols. However, these errors can be easily corrected. After postprocessing the accuracy is comparable to Tesseract (1% less).
    3. The pre-trained model for English has only 10% of the file size of Tesseracts English train data (2MB vs 23MB).

    For Chinese texts, which seem to be the main priortiy of PaddleOCR at the moment, the situation could be different.