Search code examples
javascripttensorflowdeep-learningtesseracttensorflow.js

Deep learning -Detect features of an electric pole


I'm working on an app that have to detect all visible features of an electric pole with just 2 or 3 photos,

Detect type of pole and features of it:

pole 1

pole 2

Read useful text on the engrave (not all text are needed) :

Engraving 1

Engraving 2

I have to detect that directly on the browser so I plan to use tensorflow.js in the frontend.

1 - Should create a new model or train a pre-trained model? What pre-trained model should I use for this kind of image.

2 - Is it possible to detect the number of cables and their types?

3 - For the engraving part, most of the time the text is unreadable even for the humain eye. Is it possible to train tesseract to read this?

Thanks!


Solution

  • As you are dealing with a very small dataset you can do the both of these:

    1. Augment the dataset i.e., artificially create more images from the dataset you have there are many methods to do so. Keras has the ImageDataGenerator to do this.

    2. Use a pre-trained model with transfer learning such as MobileNetV2. Using a pre-trained model helps a lot as it has already learnt many features in an image dataset and will transfer that knowledge over to the dataset you are currently using. More details on how to do so can be found here.

    For the model you would need to use two of them.

    1. First one will be used to detect the poles and the cables around them. Crop the image based on the bounding boxes of the poles and the cables, label them accordingly(you would have to hand label them so the process can be quite tedious).

    2. The second one will be used to detect the engravings from the poles. Crop the images of the poles to only include the portions of the engravings and label the data yourself. You could use the same formula as the two steps mentioned above about augmentation and transfer learning.

    Lastly, to perform OCR you can use Tesseract JS which has an api you can link.

    Model architecture:

    Input Tensor -> Model for detecting poles and cables -> Cropped images of the poles -> Model for detecting text engravings -> Cropped images of the engravings -> Perform OCR on the engravings.

    You crop the engravings out of the poles to increase the accuracy of the model. If you do not and just perform it on the poles then it will perform badly.

    As you are doing this on a mobile device do not forget to quantize the model for better performance on mobile devices.