python machine-learning scikit-learn data-science logistic-regression

X = X.toarray() NameError: name 'X' is not defined. when loading dataset using load_svmlight_file() trying to convert X to ndarray

from sklearn.datasets import load_svmlight_file

def get_data(dn):
    # load_svmlight_file loads dataset into sparse CSR matrix
    X,Y = load_svmlight_file(dn)
    print(type(X)) # you will get numpy.ndarray
    return X,Y


# convert X to ndarray
X = X.toarray()
print(type(X))
    
# As you are going to implement logistic regression, you have to convert the labels into 0 and 1 
Y = np.where(Y == -1, 0, 1)

When running the code I get the following error X = X.toarray() NameError: name 'X' is not defined, the code is meant to convert this dataset url= 'https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/binary/diabetes' wget.download(url,'Assingment1')

Solution

You didn't call the get_data(dn) function first you need to call it first before converting X to array.

It should be something like this:

from sklearn.datasets import load_svmlight_file
def get_data(dn):
    # load_svmlight_file loads dataset into sparse CSR matrix
    X,Y = load_svmlight_file(dn)
    print(type(X)) # you will get numpy.ndarray
    return X,Y

# X, Y = get_data(dn) uncomment this code and pass the dn parameter you want.
# convert X to ndarray
X = X.toarray()
print(type(X))

# As you are going to implement logistic regression, you have to convert the 
labels into 0 and 1 
Y = np.where(Y == -1, 0, 1)

Uncomment the calling of the function on line 8 and pass to it the dn parameter and then X and Y should be defined.