Search code examples
pythonjsondatasetconv-neural-networkimage-classification

Is there any way to match the JSON file with the dataset (images) in python


I'm working on a machine learning (IMAGE CLASSIFICATION) and I found a data set that has two files:

  1. The images (20,000 images) "The images "The images are numbered from 1 to 20,000 (not classified into classes)"
  2. A JSON file that has the information and classification of the images (12 classes of images) The JSON file is structured as follows:
{
  "<image_number>": {
    "image_filepath": "images/<image_number>.jpg", 
    "anomaly_class": "<class_name>"
  },
  ...
}

So I'm trying to read the JSON file and split the data set so I can deal with each class individually.. Then take 80% of "each class" as a training set and 20% for the testing set

I tried to find a way to match the JSON file with the dataset (images) So I can classify the classes in individual folders then divide them into training and testing sets

Anyone can help me with that?

THANK YOU


Solution

  • Something like the following would create folders for each of the classes and then move the images into them.

    import json
    import os
    from os import path
    # Open the json file containing the classifications
    with open("clasification.json", "r") as f:
       classification = json.load(f)
    # Create a set which contains all the classes
    classes = set([i["anomaly_class"] for i in classification.values()])
    # For each of the classes make a folder to contain them
    for c in classes:
        os.makedirs(c)
    # For each image entry in the json move the image to the folder named it's class
    for image_number, image_data in classification.items():
        os.rename(image_data["image_filepath"], path.join(image_data["anomaly_class"], "{}.jpg".format(image_number)))