Hello i'm working with text classification. I've a dataset with 2 columns one made of text and the other one is the label. Since i'm a beginner i'm following step by step a tutorial on W2vec trying to understand if it can work for my usecase but i keep getting this error.
This is my code
class MeanEmbeddingVectorizer(object):
def __init__(self, word2vec):
self.word2vec = word2vec
# if a text is empty we should return a vector of zeros
# with the same dimensionality as all the other vectors
self.dim = len(next(iter(word2vec.values())))
def fit(self, X, y):
return self
def transform(self, X):
return np.array([
np.mean([self.word2vec[w] for w in words if w in self.word2vec]
or [np.zeros(self.dim)], axis=0)
for words in X
])
train_df['clean_text_tok']=[nltk.word_tokenize(i) for i in train_df['clean_text']]
model = Word2Vec(train_df['clean_text_tok'],min_count=1)
w2v = dict(zip(model.wv.index_to_key, model.wv.vectors))
modelw = MeanEmbeddingVectorizer(w2v)
# converting text to numerical data using Word2Vec
X_train_vectors_w2v = modelw.transform(X_train_tok)
X_val_vectors_w2v = modelw.transform(X_test_tok)
the error i'm getting is :
Dimension: 100
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-127-289141692350> in <module>
4 modelw = MeanEmbeddingVectorizer(w2v)
5 # converting text to numerical data using Word2Vec
----> 6 X_train_vectors_w2v = modelw.transform(X_train_tok)
7 X_val_vectors_w2v = modelw.transform(X_test_tok)
AttributeError: 'MeanEmbeddingVectorizer' object has no attribute 'transform'
If your MeanEmbeddingVectorizer
is defined in your code exactly as its shows here, the failure-to-indent the .fit()
and .transform()
functions means they're not part of the class, as you likely intended.
Indenting those each an extra 4 spaces – as was likely the intent of any source you copied this code from! – will put them "inside" the MeanEmbeddingVectorizer
class, as class methods. Then, objects of that class won't give the same "no attribute" error.
For example:
class MeanEmbeddingVectorizer(object):
def __init__(self, word2vec):
self.word2vec = word2vec
# if a text is empty we should return a vector of zeros
# with the same dimensionality as all the other vectors
self.dim = len(next(iter(word2vec.values())))
def fit(self, X, y):
return self
def transform(self, X):
return np.array([
np.mean([self.word2vec[w] for w in words if w in self.word2vec]
or [np.zeros(self.dim)], axis=0)
for words in X
])