I would like to use datasets: emotions, scene, and yeast in my project in anaconda (python 3.6.5). I have used the following codes:
from skmultilearn.dataset import load_dataset X_train, y_train, feature_names, label_names = load_dataset('emotions', 'train')
It works successfully when I am connected to the internet, But when I am offline, it doesn't work! I have downloaded all 3 named above datasets in a folder like this:
H:\Projects\Datasets
How can I use this folder as my source datasets while I am offline? (I'm using windows 10)
The extensions of datasets that I have downloaded them are: .rar Like this: emotions.rar, scene.rar, and yeast.rar, and I have downloaded them from: http://mulan.sourceforge.net/datasets-mlc.html
You can but you first need to know the path that the dataset was stored to. To do this you can load once and get the path. This path will never change so you only need to do the following once in order to get the desired path. Next, knowing the path, you can load offline whatever you want.
Example:
from sklearn.datasets import load_iris
import pandas as pd, os
#get the path
path = load_iris()['filename']
print(path)
#offline load
df = pd.read_csv(path)
#the path: THIS IS WHAT YOU NEED
main_path_with_datasets = os.path.dirname(path)
Once you get the main_path_with_datasets
i.e. by doing main_path_with_datasets = os.path.dirname(path)
, you will now have the path. You can use it to load all the available downloaded datasets.
os.listdir(main_path_with_datasets)
['digits.csv.gz',
'wine_data.csv',
'diabetes_target.csv.gz',
'iris.csv',
'breast_cancer.csv',
'diabetes_data.csv.gz',
'linnerud_physiological.csv',
'linnerud_exercise.csv',
'boston_house_prices.csv']
EDIT for skmultilearn
from skmultilearn.dataset import load_dataset_dump
path = 'C:\\Users\\myname\\scikit_ml_learn_data\\'
X, y, feature_names, label_names = load_dataset_dump(path + 'emotions-train.scikitml.bz2')