My code is:
model = ResNet50(weights='imagenet')
ret_val, frame = video_capture.read()
frame = cv2.resize(frame,(224,224))
cv2.imshow(windowName, frame)
frame = np.expand_dims(frame, axis=0)
frame = preprocess_input(frame)
preds = model.predict(frame)
print('Predicted:', decode_predictions(preds, top=3)[0])
The results I get are:
Predicted: [('n04550184', 'wardrobe', 0.40715462), ('n04209239', 'shower_curtain', 0.09730709), ('n04005630', 'prison', 0.04603362)]
I don't have an x, y, w, h
so I don't know where to draw the bounding boxes.
Any tips?
You can't using ResNet. ResNet has a FC layer that processes out the location information. Try to use YOLO instead. It is featured with very efficient location & object classification. Another solution to this is not to use Keras but to use the built-in Haar Wavelet Transforming Cascade of OpenCV. This link has all you need to know to train a OpenCV Haar Cascade.