python tensorflow machine-learning object-detection

Trained model detects almost everything as one class after a long training

I trained a custom person detector using Tensorflow and Inception's pretrained model then after a few thousands of step and an average of 2-1 loss, I've stopped the training and tested it with a live video. The result was quite good and only gets few false positives. It can detect some person but not everyone so I decided to continue on training the model until I get an average loss of below 1 then tested it again. It now detects almost everything as a person even the whole frame of the video even when there is no object present. The models seems to work great on pictures but not on videos. Is that an overfitting?

Sorry I forgot how many steps it is. I accidentally deleted the training folder that contains the ckpt and tfevents.

edit: I forgot that I am also training the same model with same dataset but higher batch size on a cloud as a backup which is now on a higher step. I'll edit the post later and will provide the infos from tensorboard once I've finished downloading and testing the model from the cloud.

edit2: I downloaded the trained model on 200k steps from the cloud and it is working, it detects persons but sometimes recognizes the whole frame as "person" for less than a second when I am moving the camera. I guess this could be improved by continuing on training the model. Total Loss on tensorboard

For now, I'll just continue the training on the cloud and try to document every results of my test. I'll also try to resize some images on my dataset and train it on my local machine using mobilenet and compare the results from two models.

Solution

As you are saying the model did well when there were less training iterations, I guess the pre-trained model could already detect the person object and your training set made the detection worse.

The models seems to work great on pictures but not on videos

If your single pictures are detected fine, then videos should work too. the only difference can be from video image resolution and quality. So, compare the image resolution and the video.

Is that an overfitting?

The images and the videos, you are talking about, If the images were used in training you should not use them to evaluate the model. If the model is over fitted it will detect the training images but not any other ones.

As you are saying, the model detects too many detections, I think this is not because of overfitting, it can be about your dataset. I think

You have too little amount of data to train.
The network model is too big and complicated for the amount of data. Try smaller network like VGG, inception_v1(ssd mobile net) etc.
The image resolution used in training set is very different from the evaluation images.
Learning rate is important, but I think in your case it's fine.

I think you can check carefully the dataset you used for training and use as many data as you can for the training. These are the things I generally experienced and wasted time.