Search code examples
ocrtesseractpython-tesseractopenalprautomatic-license-plate-recognition

How to Create Traineddata file For Tesseract 4.1.0


I want to recognise the characters of NumberPlate. How to train the tesseract-ocr for respective number plate in ubuntu 16.04. Since i don't familiar with training. Please help me to create a 'traineddata' file for recognizing numberplate.

sample Number plate for which i want to detect character

sample Number plate for which i want to detect character.

I have 1000 images of number plate.

Please look into it. Any help would be appreciate.

So I have tried the following commands

tesseract [langname].[fontname].[expN].[file-extension] [langname].[fontname].[expN] batch.nochop makebox

tesseract eng.arial.plate3655.png eng.arial.plate3655 batch.nochop makebox

But it gives error.

Tesseract Open Source OCR Engine v4.1.0-rc1-56-g7fbd with Leptonica
Error, cannot read input file eng.arial.plate3655.png: No such file or directory
Error during processing.

after that i have tried

tesseract plate4.png eng.arial.plate4 batch.nochop makebox

it works but in some plates. Now in Step 2. I am getting error.

Screenshot is attached.

Plate 4 image for training

Step 1 and Ste p2 display in terminal

File Generated after step 1 and step 2

Content of file generated after step 1 and step 2


Solution

  • Creating .traineddata for Tesseract 4

    {*Note : After install tesseract open cmd and do the following.}

    Step 1: Make box files for images that we want to train

    Syntax:

    tesseract [langname].[fontname].[expN].[file-extension] [langname].[fontname].[expN] batch.nochop makebox
    

    Eg:

    tesseract own.arial.exp0.jpg own.arial.exp0 batch.nochop makebox
    

    {*Note:After making box files we have to change or modify wrongly identified characters in box files.}

    Step 2: Create .tr file (Compounding image file and box file)

    Syntax:

    tesseract [langname].[fontname].[expN].[file-extension] [langname].[fontname].[expN] box.train
    

    Eg: tesseract own.arial.exp0.jpg own.arial.exp0 box.train

    step 3: Extract the charset from the box files (Output for this command is unicharset file)

    Syntax:

    unicharset_extractor [langname].[fontname].[expN].box 
    

    Eg:

    unicharset_extractor  own.arial.exp0.box
    

    step 4: Create a font_properties file based on our needs.

    Syntax:

    echo "[fontname] [italic (0 or 1)] [bold (0 or 1)] [monospace (0 or 1)] [serif (0 or 1)] [fraktur (0 or 1)]" > font_properties 
    

    Eg:

    echo "arial 0 0 1 0 0" > font_properties
    

    Step 5: Training the data.

    Syntax:

    mftraining -F font_properties -U unicharset -O [langname].unicharset [langname].[fontname].[expN].tr
    

    Eg:

    mftraining -F font_properties -U unicharset -O own.unicharset own.arial.exp0.tr
    

    Step 6:

    Syntax:

    cntraining [langname].[fontname].[expN].tr
    

    Eg:

    cntraining own.arial.exp0.tr
    

    {*Note:After step 5 and step 6 four files were created.(shapetable,inttemp,pffmtable,normproto) }

    Step 7: Rename four files (shapetable,inttemp,pffmtable,normproto) into ([langname].shapetable,[langname].inttemp,[langname].pffmtable,[langname].normproto)

    Syntax:

    rename filename1 filename2
    

    Eg:

        rename shapetable own.shapetable
        rename inttemp own.inttemp
        rename pffmtable own.pffmtable
        rename normproto own.normproto
    

    Step 8: Create .traineddata file

    Syntax:

    combine_tessdata [langname].
    

    Eg:

    combine_tessdata own.
    

    { *Note : I will use only one image exp0 for creating traineddata.if you want to train more than one image you can train i.e exp1,exp2..expn }

    Reference