Search code examples
pythonmachine-learningscikit-learndata-sciencelogistic-regression

X = X.toarray() NameError: name 'X' is not defined. when loading dataset using load_svmlight_file() trying to convert X to ndarray


from sklearn.datasets import load_svmlight_file
def get_data(dn):
    # load_svmlight_file loads dataset into sparse CSR matrix
    X,Y = load_svmlight_file(dn)
    print(type(X)) # you will get numpy.ndarray
    return X,Y


# convert X to ndarray
X = X.toarray()
print(type(X))
    
# As you are going to implement logistic regression, you have to convert the labels into 0 and 1 
Y = np.where(Y == -1, 0, 1)

When running the code I get the following error X = X.toarray() NameError: name 'X' is not defined, the code is meant to convert this dataset url= 'https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/binary/diabetes' wget.download(url,'Assingment1')


Solution

  • You didn't call the get_data(dn) function first you need to call it first before converting X to array.

    It should be something like this:

    from sklearn.datasets import load_svmlight_file
    def get_data(dn):
        # load_svmlight_file loads dataset into sparse CSR matrix
        X,Y = load_svmlight_file(dn)
        print(type(X)) # you will get numpy.ndarray
        return X,Y
    
    # X, Y = get_data(dn) uncomment this code and pass the dn parameter you want.
    # convert X to ndarray
    X = X.toarray()
    print(type(X))
    
    # As you are going to implement logistic regression, you have to convert the 
    labels into 0 and 1 
    Y = np.where(Y == -1, 0, 1)
    

    Uncomment the calling of the function on line 8 and pass to it the dn parameter and then X and Y should be defined.