Search code examples
pythonscikit-learnmultilabel-classificationskmultilearn

Loading datasets in offline mode in sklearn and skmultilearn


I would like to use datasets: emotions, scene, and yeast in my project in anaconda (python 3.6.5). I have used the following codes:

from skmultilearn.dataset import load_dataset
X_train, y_train, feature_names, label_names = load_dataset('emotions', 'train')

It works successfully when I am connected to the internet, But when I am offline, it doesn't work! I have downloaded all 3 named above datasets in a folder like this:

H:\Projects\Datasets

How can I use this folder as my source datasets while I am offline? (I'm using windows 10)

The extensions of datasets that I have downloaded them are: .rar Like this: emotions.rar, scene.rar, and yeast.rar, and I have downloaded them from: http://mulan.sourceforge.net/datasets-mlc.html


Solution

  • You can but you first need to know the path that the dataset was stored to. To do this you can load once and get the path. This path will never change so you only need to do the following once in order to get the desired path. Next, knowing the path, you can load offline whatever you want.

    Example:

    from sklearn.datasets import load_iris
    import pandas as pd, os
    
    #get the path
    path = load_iris()['filename']
    print(path)
    
    #offline load
    df = pd.read_csv(path)
    
    #the path: THIS IS WHAT YOU NEED
    main_path_with_datasets = os.path.dirname(path)
    

    Once you get the main_path_with_datasets i.e. by doing main_path_with_datasets = os.path.dirname(path), you will now have the path. You can use it to load all the available downloaded datasets.

    os.listdir(main_path_with_datasets)
    
    ['digits.csv.gz',
     'wine_data.csv',
     'diabetes_target.csv.gz',
     'iris.csv',
     'breast_cancer.csv',
     'diabetes_data.csv.gz',
     'linnerud_physiological.csv',
     'linnerud_exercise.csv',
     'boston_house_prices.csv']
    
    

    EDIT for skmultilearn

    from skmultilearn.dataset import load_dataset_dump
    
    path = 'C:\\Users\\myname\\scikit_ml_learn_data\\'
    
    X, y, feature_names, label_names = load_dataset_dump(path + 'emotions-train.scikitml.bz2')