python scikit-learn python-3.7 nameerror train-test-split

How to solve Nameerror: name 'n' is not defined in train_test_split of scikit-learn 0.22 version without downgrading the version?

I am doing sentiment analysis and using scikit learn train_test_split function. But I am getting Nameerror: 'n' is not defined even though I have defined it. After checking various forums I found out that this error is pertaining in the new versions (after 0.19) of scikit learn. So the solution that is given is to downgrade the scikit learn to 0.19 version and it will work. But my problem is that I am working on python 3.7 and using anaconda3, jupyter notebook 6.0.3 and it is not downgrading to the older version.

What should I do? How to solve this issue?

def postprocess(data, n=1000000):
    data = data.head(n)
    data['tokens'] = data['Articles'].progress_map(tokenize)  ## progress_map is a variant of the map function plus a progress bar. Handy to monitor DataFrame creations.
    data = data[data.tokens != 'NC']
    data.reset_index(inplace=True)
    data.drop('index', inplace=True, axis=1)
    return data

data = postprocess(data)

x_train, x_test, y_train, y_test = train_test_split(np.array(data.head(n).tokens),
                                                    np.array(data.head(n).Sentiment), test_size=0.2)

Error:

NameError Traceback (most recent call last) in ----> 1 x_train, x_test, y_train, y_test = train_test_split(np.array(data.head(n).tokens), 2 np.array(data.head(n).Sentiment), test_size=0.2)

NameError: name 'n' is not defined

Thanks in Advance.

Solution

You don't seem to define n anywhere out of your postprocess function, plus it sounds very unlikely that such an error is due to a scikit-learn bug in recent versions (when claiming something like that, you should always include the results of your own research).

In any case, this will most probably work (provided that there are no other issues with your code & data):

n=1000000
data = postprocess(data, n=n)
x_train, x_test, y_train, y_test = train_test_split(np.array(data.head(n).tokens),
                                                    np.array(data.head(n).Sentiment), test_size=0.2)