I am trying to run W2V algorithm. I find index error and not sure where I am going wrong. Here's the error:
IndexError: only integers, slices (
:
), ellipsis (...
), numpy.newaxis (None
) and integer or boolean arrays are valid indices
and here's the code:
def makeFeatureVec(words, model, num_features):
# Function to average all of the word vectors in a given
# paragraph
#
# Pre-initialize an empty numpy array (for speed)
featureVec = np.zeros((num_features,),dtype="float32")
#
nwords = 0.
#
# Index2word is a list that contains the names of the words in
# the model's vocabulary. Convert it to a set, for speed
index2word_set = set(model.wv.index2word)
#
# Loop over each word in the review and, if it is in the model's
# vocaublary, add its feature vector to the total
for word in words:
if word in index2word_set:
nwords = nwords + 1.
featureVec = np.add(featureVec,model[word])
#
# Divide the result by the number of words to get the average
featureVec = np.true_divide(featureVec,nwords)
return featureVec
def getAvgFeatureVecs(reviews,model,num_features):
# Given a set of reviews (each one a list of words), calculate
# the average feature vector for each one and return a 2D numpy array
#
# Initialize a counter
counter = 0.
#
# Preallocate a 2D numpy array, for speed
reviewFeatureVecs = np.zeros((len(reviews),num_features),dtype="float32")
#
# Loop through the reviews
for review in reviews:
#
# Print a status message every 1000th review
if counter%1000. == 0.:
print ("Review %d of %d" % (counter, len(reviews)))
#
# Call the function (defined above) that makes average feature vectors
reviewFeatureVecs[counter] = makeFeatureVec(review, model,num_features)
#
# Increment the counter
counter = counter + 1.
return reviewFeatureVecs
This piece of code is from Bag-of-Words-Meets-Bags-of-Popcorn-Kaggle. I am not sure where the error is. I thing np.divide
is raisng an error. I am working on windows
counter = counter + 1.
should be
counter = counter + 1
(note the dot) or counter += 1
.
The dot makes counter
a float (since 1.
is equivalent to 1.0
) and floats can not be used as indexes.