python scikit-learn multilabel-classification skmultilearn

Loading datasets in offline mode in sklearn and skmultilearn

I would like to use datasets: emotions, scene, and yeast in my project in anaconda (python 3.6.5). I have used the following codes:

from skmultilearn.dataset import load_dataset
X_train, y_train, feature_names, label_names = load_dataset('emotions', 'train')

It works successfully when I am connected to the internet, But when I am offline, it doesn't work! I have downloaded all 3 named above datasets in a folder like this:

H:\Projects\Datasets

How can I use this folder as my source datasets while I am offline? (I'm using windows 10)

The extensions of datasets that I have downloaded them are: .rar Like this: emotions.rar, scene.rar, and yeast.rar, and I have downloaded them from: http://mulan.sourceforge.net/datasets-mlc.html

Solution

You can but you first need to know the path that the dataset was stored to. To do this you can load once and get the path. This path will never change so you only need to do the following once in order to get the desired path. Next, knowing the path, you can load offline whatever you want.

Example:

from sklearn.datasets import load_iris
import pandas as pd, os

#get the path
path = load_iris()['filename']
print(path)

#offline load
df = pd.read_csv(path)

#the path: THIS IS WHAT YOU NEED
main_path_with_datasets = os.path.dirname(path)

Once you get the main_path_with_datasets i.e. by doing main_path_with_datasets = os.path.dirname(path), you will now have the path. You can use it to load all the available downloaded datasets.

os.listdir(main_path_with_datasets)

['digits.csv.gz',
 'wine_data.csv',
 'diabetes_target.csv.gz',
 'iris.csv',
 'breast_cancer.csv',
 'diabetes_data.csv.gz',
 'linnerud_physiological.csv',
 'linnerud_exercise.csv',
 'boston_house_prices.csv']

EDIT for skmultilearn

from skmultilearn.dataset import load_dataset_dump

path = 'C:\\Users\\myname\\scikit_ml_learn_data\\'

X, y, feature_names, label_names = load_dataset_dump(path + 'emotions-train.scikitml.bz2')