from sklearn.datasets import load_svmlight_file
def get_data(dn):
# load_svmlight_file loads dataset into sparse CSR matrix
X,Y = load_svmlight_file(dn)
print(type(X)) # you will get numpy.ndarray
return X,Y
# convert X to ndarray
X = X.toarray()
print(type(X))
# As you are going to implement logistic regression, you have to convert the labels into 0 and 1
Y = np.where(Y == -1, 0, 1)
When running the code I get the following error X = X.toarray() NameError: name 'X' is not defined
, the code is meant to convert this dataset url= 'https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/binary/diabetes'
wget.download(url,'Assingment1')
You didn't call the get_data(dn) function first you need to call it first before converting X to array.
It should be something like this:
from sklearn.datasets import load_svmlight_file
def get_data(dn):
# load_svmlight_file loads dataset into sparse CSR matrix
X,Y = load_svmlight_file(dn)
print(type(X)) # you will get numpy.ndarray
return X,Y
# X, Y = get_data(dn) uncomment this code and pass the dn parameter you want.
# convert X to ndarray
X = X.toarray()
print(type(X))
# As you are going to implement logistic regression, you have to convert the
labels into 0 and 1
Y = np.where(Y == -1, 0, 1)
Uncomment the calling of the function on line 8 and pass to it the dn parameter and then X and Y should be defined.