This is a sentiment analysis code and every time I change my input, it takes 10-15 mins to compile. What are the ways in which I can reduce it? Using pickle by saving the classifier or any other method is preferable? Other functions are not mentioned here.
inpTweets = csv.reader(open('training_neatfile_4.csv', 'r' ,encoding='ISO-8859-1'), delimiter=',')
stopWords = getStopWordList('stopwords.txt')
count = 0;
featureList = []
tweets = []
for row in inpTweets:
sentiment = row[0]
tweet = row[1]
processedTweet = processTweet(tweet)
featureVector = getFeatureVector(processedTweet, stopWords)
featureList.extend(featureVector)
tweets.append((featureVector, sentiment));
#end loop
# Remove featureList duplicates
featureList = list(set(featureList))
# Generate the training set
training_set = nltk.classify.util.apply_features(extract_features, tweets)
# Train the Naive Bayes classifier
nb_classifier = nltk.NaiveBayesClassifier.train(training_set)
# Test the classifier
testTweet = 'He is a brainless kid'
processedTestTweet = processTweet(testTweet)
sentiment = nb_classifier.classify(extract_features(getFeatureVector(processedTestTweet, stopWords)))
print ("testTweet = %s, sentiment = %s\n" % (testTweet, sentiment))
Training a NaiveBayesClassifier(or any) takes a lot of time(depends on the feeding of training data), it becomes easier if you save the object of classifier(NBClassifier) once you've trained it to save time by omitting re-training.
Following is the way to save objects using pickle, you may use it in your code to save train or load the Classifier.
import pickle
pickle.dump(object, file)
You may save NaiveBayesClassifier by saving its object(nb_classifier) as following.
with open('model.pkl', 'wb') as nb_classifier_model:
pickle.dump(nb_classifier, nb_classifier_model)
Then, you can retrieve it as:
with open('model.pkl', 'rb') as nb_classifier_model:
nb_classifier = pickle.load(nb_classifier_model)
That is how you may achieve your goal by using accordingly.
Hope it helps!