Search code examples
computer-visiondatasetartificial-intelligencewordnetimagenet

How do I find ID to download ImageNet Subset?


I am new to ImageNet and would like to download full sized images of one of the subsets/synsets however I have found it incredibly difficult to actually find what subsets are available and where to find the ID code so I can download this.

All previous answers (from only 7 months ago) contain links which are now all invalid. Some seem to imply there is some sort of algorithm to making up an ID as it is linked to wordnet??

Essentially I would like a dataset of plastic or plastic waste or ideally marine debris. Any help on how to get the relevant ImageNet ID or suggestions on other datasets would be much much appreciated!!


Solution

  • I used this repo to achieve what you're looking for. Follow the following steps:

    1. Create an account on Imagenet website
    2. Once you get the permission, download the list of WordNet IDs for your task
    3. Once you've the .txt file containing the WordNet IDs, you are all set to run main.py
    4. As per your need, you can adjust the number of images per class
    5. By default ImageNet images are automatically resized into 224x224. To remove that resizing, or implement other types of preprocessing, simply modify the code in line #40

    Source: Refer this medium article for more details.

    You can find all the 1000 classes of ImageNet here.

    EDIT: Above method doesn't work post March 2021. As per this update:

    The new website is simpler; we removed tangential or outdated functions to focus on the core use case—enabling users to download the data, including the full ImageNet dataset and the ImageNet Large Scale Visual Recognition Challenge (ILSVRC).

    So with this, to parse and search imagenet now you may have to use nltk.

    More recently, the organizers hosted a Kaggle challenge based on the original dataset with additional labels for object detection. To download the dataset you need to register a Kaggle account and join this challenge. Please note that by doing so, you agree to abide by the competition rules.

    Please be aware that this file is very large (168 GB) and the download will take anywhere from minutes to days depending on your network connection.

    Install the Kaggle CLI and set up credentials as per this guideline.

    pip install kaggle
    

    Then run these:

    kaggle competitions download -c imagenet-object-localization-challenge
    unzip imagenet-object-localization-challenge.zip -d <YOUR_FOLDER>
    

    Additionally to understand ImageNet hierarchy refer this.