Search code examples
pythonfiftyone

Exclude certain classes when loading dataset with fiftyone


I am trying to get a bunch of images from open images to use for training a object detection classifier. I found that probably the easiest way to get images from open images is to use the python program FiftyOne. Using FiftyOne I can download the images belonging to a specific class by specifying the class in the command.

My question is now how can I exclude certain classes?

I want to train a classifier for recognizing vehicle licence plates. For the training process I need both positive and negative example images.
As I want to recognize the licence plate and not the vehicle I want to get negative examples with vehicles in them.

My idea was to get the negative examples from the class "Car", but they should not be part of the class "Vehicle registration plate".

Is there a way to tell the create command from FiftyOne that it should not include images with the class "Vehicle registration plate"?

The command I am currently using is as follows:
dataset = foz.load_zoo_dataset("open-images-v6", split="train", classes="Car", max_samples=10000)
This however downloads images that also belong to the class "Vehicle registration plate" which I do not want.

I do not want to use FiftyOne for anything else, apart from getting the training data.

Even though it should not have anything to do with this question:
I am going to use OpenCV for training and using the classifier.


Solution

  • After downloading images of cars, you can use the filtering capabilities of FiftyOne to separate out the positive and negative examples for your task. There is no way to specifically exclude classes when downloading a dataset from the FiftyOne Zoo.

    Open Images provides sample-level positive and negative labels indicating if a class definitely does or does not exist in the sample. These are not exhaustively labeled, though, so if the class is not present in either of the sample-level annotations, then there is no way to know if it exists or not.

    Because of that, there are a couple ways to get all of the relevant samples for your task.

    1) Use only samples with annotated Vehicle registration plates

    from fiftyone import ViewField as F
    
    class_name = "Vehicle registration plate"
    
    # Find samples that have a "Vehicle registration plate"
    pos_view = dataset.filter_labels("positive_labels", F("label")==class_name)
    
    # Find all samples that don't have a "Vehicle registration plate"
    neg_view = dataset.filter_labels("negative_labels", F("label")==class_name)
    

    This is the fastest way to get samples that you can be sure either do or don't have a license plate. However, you will be throwing away samples where license plates are not annotated.

    2) Manually filter out unlabeled samples

    If you need as much data as possible, then you can manually go through the samples where license plates were not annotated and find additional negative examples.

    from fiftyone import ViewField as F
    
    class_name = "Vehicle registration plate"
    
    # Find samples that have a "Vehicle registration plate"
    pos_view = dataset.filter_labels("positive_labels", F("label")==class_name)
    
    # Find all samples without a positively labeled "Vehicle registration plate"
    neg_view = dataset.exclude(pos_view)
    

    From here, launch the FiftyOne App and tag all samples that have a plate.

    enter image description here

    # Tag any samples that have a plate in the App with "remove"
    session = fo.launch_app(view=neg_view)
    
    # Find and remove all tagged samples from the DatasetView
    neg_view = neg_view.match_tags("remove", bool=False)
    

    You can then export the data to disk in a variety of formats to train your model. If the format you need isn't listed, you can simply iterate over your dataset and save the data manually.

    neg_view.export(
        export_dir="/path/to/dir",
        dataset_type=fo.types.COCODetectionDataset,
        label_field="detections",
    )
    

    Once you trained your model, I would recommend using FiftyOne to visualize/analyze your predictions to understand how your model performs so that you can improve it.