python tensorflow keras deep-learning ocr

Predict an entire document ocr text using a model trained on 32x32 alphabet images

So I have trained a tensorflow model for OCR using a alphabet dataset i downloaded from here

creating Xtrain, Xtest and Ytrain, Ytest: folders contain folders of each alphabets with 15k images in it of sixe 32x32.

import os
from PIL import Image
from numpy import asarray

folders = os.listdir(path)

train_max = 100
test_max = 10

Xtrain = []
Ytrain = []
Xtest = []
Ytest = []

for folder in folders:
    folder_opened = path + folder + '/'
    count = 0
    for chars in os.listdir(folder_opened):
        count += 1
        if count <= train_max:
            image = Image.open(folder_opened + chars)
            data = asarray(image)
            Xtrain.append(data)
            Ytrain.append(folder)
        elif count > train_max and count <= train_max + test_max:
            image = Image.open(folder_opened + chars)
            data = asarray(image)
            Xtest.append(data)
            Ytest.append(folder)
        else:
            break

My training code :

import tensorflow as tf

Xtrain = tf.keras.utils.normalize(Xtrain, axis = 1)
Xtest = tf.keras.utils.normalize(Xtest, axis = 1)

model = tf.keras.models.Sequential()
model.add(tf.keras.layers.Flatten())
model.add(tf.keras.layers.Dense(128, activation=tf.nn.relu))
model.add(tf.keras.layers.Dense(128, activation=tf.nn.relu))
model.add(tf.keras.layers.Dense(128, activation=tf.nn.relu))
model.add(tf.keras.layers.Dense(128, activation=tf.nn.relu))
model.add(tf.keras.layers.Dense(30, activation=tf.nn.softmax))

model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'],
              )

model.fit(Xtrain, factorize(Ytrain)[0], epochs=40, validation_data = (Xtest, factorize(Ytest)[0]))

This model works perfectly in predicting the images that contain a single alphabet of size 32x32.

But for real life application, i need to use this model to extract an entire text from a documetn (eg: PAN card, ID card, passport, etc..)

What all I have tried :

I tried to read the image using pillow and convert it into numpy array and then use model.predict on it.

image_adhar = Image.open(path_2 + 'adhar1.jpeg')
image_adhar = asarray(image_adhar)
model.predict([image_adhar])

When doing so, I get this error

WARNING:tensorflow:Layers in a Sequential model should only have a single input tensor, but we receive a <class 'tuple'> input: (<tf.Tensor 'IteratorGetNext:0' shape=(None, 500, 3) dtype=uint8>,)
Consider rewriting this model with the Functional API.
WARNING:tensorflow:Model was constructed with shape (None, 32, 32) for input KerasTensor(type_spec=TensorSpec(shape=(None, 32, 32), dtype=tf.float32, name='flatten_30_input'), name='flatten_30_input', description="created by layer 'flatten_30_input'"), but it was called on an input with incompatible shape (None, 500, 3).

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-165-bba8716b47d4> in <module>
----> 1 model.predict([image_adhar])

~\anaconda3\lib\site-packages\tensorflow\python\keras\engine\training.py in predict(self, x, batch_size, verbose, steps, callbacks, max_queue_size, workers, use_multiprocessing)
   1725           for step in data_handler.steps():
   1726             callbacks.on_predict_batch_begin(step)
-> 1727             tmp_batch_outputs = self.predict_function(iterator)
   1728             if data_handler.should_sync:
   1729               context.async_wait()

~\anaconda3\lib\site-packages\tensorflow\python\eager\def_function.py in __call__(self, *args, **kwds)
    887 
    888       with OptionalXlaContext(self._jit_compile):
--> 889         result = self._call(*args, **kwds)
    890 
    891       new_tracing_count = self.experimental_get_tracing_count()

~\anaconda3\lib\site-packages\tensorflow\python\eager\def_function.py in _call(self, *args, **kwds)
    931       # This is the first call of __call__, so we have to initialize.
    932       initializers = []
--> 933       self._initialize(args, kwds, add_initializers_to=initializers)
    934     finally:
    935       # At this point we know that the initialization is complete (or less

~\anaconda3\lib\site-packages\tensorflow\python\eager\def_function.py in _initialize(self, args, kwds, add_initializers_to)
    761     self._graph_deleter = FunctionDeleter(self._lifted_initializer_graph)
    762     self._concrete_stateful_fn = (
--> 763         self._stateful_fn._get_concrete_function_internal_garbage_collected(  # pylint: disable=protected-access
    764             *args, **kwds))
    765 

~\anaconda3\lib\site-packages\tensorflow\python\eager\function.py in _get_concrete_function_internal_garbage_collected(self, *args, **kwargs)
   3048       args, kwargs = None, None
   3049     with self._lock:
-> 3050       graph_function, _ = self._maybe_define_function(args, kwargs)
   3051     return graph_function
   3052 

~\anaconda3\lib\site-packages\tensorflow\python\eager\function.py in _maybe_define_function(self, args, kwargs)
   3442 
   3443           self._function_cache.missed.add(call_context_key)
-> 3444           graph_function = self._create_graph_function(args, kwargs)
   3445           self._function_cache.primary[cache_key] = graph_function
   3446 

~\anaconda3\lib\site-packages\tensorflow\python\eager\function.py in _create_graph_function(self, args, kwargs, override_flat_arg_shapes)
   3277     arg_names = base_arg_names + missing_arg_names
   3278     graph_function = ConcreteFunction(
-> 3279         func_graph_module.func_graph_from_py_func(
   3280             self._name,
   3281             self._python_function,

~\anaconda3\lib\site-packages\tensorflow\python\framework\func_graph.py in func_graph_from_py_func(name, python_func, args, kwargs, signature, func_graph, autograph, autograph_options, add_control_dependencies, arg_names, op_return_value, collections, capture_by_value, override_flat_arg_shapes)
    997         _, original_func = tf_decorator.unwrap(python_func)
    998 
--> 999       func_outputs = python_func(*func_args, **func_kwargs)
   1000 
   1001       # invariant: `func_outputs` contains only Tensors, CompositeTensors,

~\anaconda3\lib\site-packages\tensorflow\python\eager\def_function.py in wrapped_fn(*args, **kwds)
    670         # the function a weak reference to itself to avoid a reference cycle.
    671         with OptionalXlaContext(compile_with_xla):
--> 672           out = weak_wrapped_fn().__wrapped__(*args, **kwds)
    673         return out
    674 

~\anaconda3\lib\site-packages\tensorflow\python\framework\func_graph.py in wrapper(*args, **kwargs)
    984           except Exception as e:  # pylint:disable=broad-except
    985             if hasattr(e, "ag_error_metadata"):
--> 986               raise e.ag_error_metadata.to_exception(e)
    987             else:
    988               raise

ValueError: in user code:

    C:\Users\faris\anaconda3\lib\site-packages\tensorflow\python\keras\engine\training.py:1569 predict_function  *
        return step_function(self, iterator)
    C:\Users\faris\anaconda3\lib\site-packages\tensorflow\python\keras\engine\training.py:1559 step_function  **
        outputs = model.distribute_strategy.run(run_step, args=(data,))
    C:\Users\faris\anaconda3\lib\site-packages\tensorflow\python\distribute\distribute_lib.py:1285 run
        return self._extended.call_for_each_replica(fn, args=args, kwargs=kwargs)
    C:\Users\faris\anaconda3\lib\site-packages\tensorflow\python\distribute\distribute_lib.py:2833 call_for_each_replica
        return self._call_for_each_replica(fn, args, kwargs)
    C:\Users\faris\anaconda3\lib\site-packages\tensorflow\python\distribute\distribute_lib.py:3608 _call_for_each_replica
        return fn(*args, **kwargs)
    C:\Users\faris\anaconda3\lib\site-packages\tensorflow\python\keras\engine\training.py:1552 run_step  **
        outputs = model.predict_step(data)
    C:\Users\faris\anaconda3\lib\site-packages\tensorflow\python\keras\engine\training.py:1525 predict_step
        return self(x, training=False)
    C:\Users\faris\anaconda3\lib\site-packages\tensorflow\python\keras\engine\base_layer.py:1030 __call__
        outputs = call_fn(inputs, *args, **kwargs)
    C:\Users\faris\anaconda3\lib\site-packages\tensorflow\python\keras\engine\sequential.py:380 call
        return super(Sequential, self).call(inputs, training=training, mask=mask)
    C:\Users\faris\anaconda3\lib\site-packages\tensorflow\python\keras\engine\functional.py:420 call
        return self._run_internal_graph(
    C:\Users\faris\anaconda3\lib\site-packages\tensorflow\python\keras\engine\functional.py:556 _run_internal_graph
        outputs = node.layer(*args, **kwargs)
    C:\Users\faris\anaconda3\lib\site-packages\tensorflow\python\keras\engine\base_layer.py:1013 __call__
        input_spec.assert_input_compatibility(self.input_spec, inputs, self.name)
    C:\Users\faris\anaconda3\lib\site-packages\tensorflow\python\keras\engine\input_spec.py:251 assert_input_compatibility
        raise ValueError(

    ValueError: Input 0 of layer dense_94 is incompatible with the layer: expected axis -1 of input shape to have value 1024 but received input with shape (None, 1500)

Forgive me, but I am new to keras and tensorflow.

I know this error has something to do with the shape of the training files and the shape of the image that i passed (adhar1.jpeg). They are not the same shape. (32x32 and 500x281) But I dont know how to modify to accept my adhar1.jpeg image

Solution

Since you have trained the model using 32x32 image, you need to give an input image of the same dimension to your model.

Step 1: Load the input image from the disk, convert it to grayscale, and blur it to reduce noise

Step 2: Perform edge detection, find contours in the edge map, and sort the resulting contours from left-to-right

Step 3: Loop over the contours, compute the bounding box of the contour and filter out too small and large boxes.

Step 4: Extract the character and threshold it to make the character appear as white (foreground) on a black background, then grab the width and height of the thresholded image

Step 5: Resize the image and apply padding if needed

Step 6: Run your model for all the chars found

For more reference, you can look into: https://www.pyimagesearch.com/2020/08/24/ocr-handwriting-recognition-with-opencv-keras-and-tensorflow/