Search code examples
pythonopencvtensorflowyolohaar-classifier

Detecting and tracking the human hand with OpenCV


I am new to OpenCV and Tensorflow. I have created a classifier using Tensorflow 2.0 to detect 26 alphabets of the American Sign language.

This is the CNN code.

# Designing our CNN
i = Input(shape=(IMAGE_SIZE[0],IMAGE_SIZE[0],3))
x = Conv2D(32, (3, 3), activation='relu', padding='same')(i)
x = BatchNormalization()(x)
x = Conv2D(32, (3, 3), activation='relu', padding='same')(x)
x = BatchNormalization()(x)
x = MaxPooling2D((2, 2))(x)
# x = Dropout(0.2)(x)
x = Conv2D(64, (3, 3), activation='relu', padding='same')(x)
x = BatchNormalization()(x)
x = Conv2D(64, (3, 3), activation='relu', padding='same')(x)
x = BatchNormalization()(x)
x = MaxPooling2D((2, 2))(x)
# x = Dropout(0.2)(x)
x = Conv2D(128, (3, 3), activation='relu', padding='same')(x)
x = BatchNormalization()(x)
x = Conv2D(128, (3, 3), activation='relu', padding='same')(x)
x = BatchNormalization()(x)
x = MaxPooling2D((2, 2))(x)
# x = Dropout(0.2)(x)

# x = GlobalMaxPooling2D()(x)
x = Flatten()(x)
x = Dropout(0.2)(x)
x = Dense(1024, activation='relu')(x)
x = Dropout(0.2)(x)
x = Dense(1024, activation='relu')(x)
x = Dropout(0.2)(x)
x = Dense(K, activation='softmax')(x)

model = Model(i, x)

Here's the link to the full code. https://colab.research.google.com/drive/1_9MVqaRpk5UnZxc8l4OC78JaHlXkAwrL

This is the preview of the image is classified.

enter image description here

It is able to detect all 26 alphabets with decent accuracy. Here is the confusion matrix. enter image description here

I was able to save the h5 file which is able to classify images of 100 x 100 that include only the hand.

Later I was able to get the feed from the webcam using OpenCV but I am not sure how to use my model to detect the hands and create a bounding box across it to extract the hand and feed it to the ASL CNN classifier. I did try to use some Haar cascades for detecting hand but it doesn't seem to detect quite well.

How can I detect the hand from the video feed like the one in this image? enter image description here

I was thinking of using YOLO but I am not sure how to train it for custom hand images or to feed my h5 file to the YOLO classifier and use it to create bounding boxes across the hands on the live video feed of the webcam.

Any links to the resources are welcome. Thank you in advance.


Solution

  • For detection by yolov3 or yolov4 you can try this:

    https://github.com/cansik/yolo-hand-detection

    As for the dataset:

    There are two types datasets in general:

    You can check how to train them or your own data by checking https://github.com/cansik/yolo-hand-detection