Search code examples
javaocralfrescotesseractalfresco-share

Tesseract-ocr is not working properly after integrating with alfresco 5.0.d


I have integrated Tesseract-ocr in Alfresco 5.0.d, My requirement is to convert PDF file data in to text format.

And Its working fine for small sized files.

But if i will upload larger size files, say more than 50 MB,

In that case its giving below Exception, and whole pdf file is not get converted in to text file. Only some starting pages are getting converted to text format.

Please refer the below logs

java.net.SocketTimeoutException: Read timed out
    at java.net.SocketInputStream.socketRead0(Native Method)
    at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
    at java.net.SocketInputStream.read(SocketInputStream.java:170)
    at java.net.SocketInputStream.read(SocketInputStream.java:141)
    at sun.security.ssl.InputRecord.readFully(InputRecord.java:465)
    at sun.security.ssl.InputRecord.read(InputRecord.java:503)

Does Anyone have faced the same issue, Please help me.

Thanks in advance.


Solution

  • You may have to increase the content transformation size of pdf to text in alfresco-global.properties file

    you can give size for transformation using these properties

    if you are using OOoDirect

    content.transformer.complex.OpenOffice.Pdf2swf.extensions.doc.swf.maxSourceSizeKBytes=5120 content.transformer.complex.OpenOffice.Pdf2swf.extensions.docx.swf.maxSourceSizeKBytes=5120

    if you are using OOoJodConverter

    content.transformer.complex.JodConverter.Pdf2swf.extensions.doc.swf.maxSourceSizeKBytes=5120
    content.transformer.complex.OpenOffice.Pdf2swf.extensions.docx.swf.maxSourceSizeKBytes=5120
    

    refer this community question https://community.alfresco.com/thread/211670-changing-transformation-limits-version-5b

    https://community.alfresco.com/thread/203406-how-to-config-alfresco-documents-preview-size-limit-on-42d

    https://injustfiveminutes.wordpress.com/2012/11/28/docx-pptx-document-preview-fails-on-alfresco-4-2-c/