I am new to OpenCV and Tensorflow. I have created a classifier using Tensorflow 2.0 to detect 26 alphabets of the American Sign language.
This is the CNN code.
# Designing our CNN
i = Input(shape=(IMAGE_SIZE[0],IMAGE_SIZE[0],3))
x = Conv2D(32, (3, 3), activation='relu', padding='same')(i)
x = BatchNormalization()(x)
x = Conv2D(32, (3, 3), activation='relu', padding='same')(x)
x = BatchNormalization()(x)
x = MaxPooling2D((2, 2))(x)
# x = Dropout(0.2)(x)
x = Conv2D(64, (3, 3), activation='relu', padding='same')(x)
x = BatchNormalization()(x)
x = Conv2D(64, (3, 3), activation='relu', padding='same')(x)
x = BatchNormalization()(x)
x = MaxPooling2D((2, 2))(x)
# x = Dropout(0.2)(x)
x = Conv2D(128, (3, 3), activation='relu', padding='same')(x)
x = BatchNormalization()(x)
x = Conv2D(128, (3, 3), activation='relu', padding='same')(x)
x = BatchNormalization()(x)
x = MaxPooling2D((2, 2))(x)
# x = Dropout(0.2)(x)
# x = GlobalMaxPooling2D()(x)
x = Flatten()(x)
x = Dropout(0.2)(x)
x = Dense(1024, activation='relu')(x)
x = Dropout(0.2)(x)
x = Dense(1024, activation='relu')(x)
x = Dropout(0.2)(x)
x = Dense(K, activation='softmax')(x)
model = Model(i, x)
Here's the link to the full code. https://colab.research.google.com/drive/1_9MVqaRpk5UnZxc8l4OC78JaHlXkAwrL
This is the preview of the image is classified.
It is able to detect all 26 alphabets with decent accuracy. Here is the confusion matrix.
I was able to save the h5 file which is able to classify images of 100 x 100 that include only the hand.
Later I was able to get the feed from the webcam using OpenCV but I am not sure how to use my model to detect the hands and create a bounding box across it to extract the hand and feed it to the ASL CNN classifier. I did try to use some Haar cascades for detecting hand but it doesn't seem to detect quite well.
How can I detect the hand from the video feed like the one in this image?
I was thinking of using YOLO but I am not sure how to train it for custom hand images or to feed my h5 file to the YOLO classifier and use it to create bounding boxes across the hands on the live video feed of the webcam.
Any links to the resources are welcome. Thank you in advance.
For detection by yolov3 or yolov4 you can try this:
https://github.com/cansik/yolo-hand-detection
As for the dataset:
There are two types datasets in general:
You can check how to train them or your own data by checking https://github.com/cansik/yolo-hand-detection