Search code examples
pythonnlpword2vec

'MeanEmbeddingVectorizer' object has no attribute 'transform'


Hello i'm working with text classification. I've a dataset with 2 columns one made of text and the other one is the label. Since i'm a beginner i'm following step by step a tutorial on W2vec trying to understand if it can work for my usecase but i keep getting this error.

This is my code

class MeanEmbeddingVectorizer(object):
    def __init__(self, word2vec):
        self.word2vec = word2vec
        # if a text is empty we should return a vector of zeros
        # with the same dimensionality as all the other vectors
        self.dim = len(next(iter(word2vec.values())))
def fit(self, X, y):
        return self
def transform(self, X):
        return np.array([
            np.mean([self.word2vec[w] for w in words if w in self.word2vec]
                    or [np.zeros(self.dim)], axis=0)
            for words in X
        ])

train_df['clean_text_tok']=[nltk.word_tokenize(i) for i in train_df['clean_text']]
model = Word2Vec(train_df['clean_text_tok'],min_count=1)
w2v = dict(zip(model.wv.index_to_key, model.wv.vectors))
modelw = MeanEmbeddingVectorizer(w2v)
# converting text to numerical data using Word2Vec
X_train_vectors_w2v = modelw.transform(X_train_tok)
X_val_vectors_w2v = modelw.transform(X_test_tok)

the error i'm getting is :

Dimension:  100
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-127-289141692350> in <module>
      4 modelw = MeanEmbeddingVectorizer(w2v)
      5 # converting text to numerical data using Word2Vec
----> 6 X_train_vectors_w2v = modelw.transform(X_train_tok)
      7 X_val_vectors_w2v = modelw.transform(X_test_tok)

AttributeError: 'MeanEmbeddingVectorizer' object has no attribute 'transform'


Solution

  • If your MeanEmbeddingVectorizer is defined in your code exactly as its shows here, the failure-to-indent the .fit() and .transform() functions means they're not part of the class, as you likely intended.

    Indenting those each an extra 4 spaces – as was likely the intent of any source you copied this code from! – will put them "inside" the MeanEmbeddingVectorizer class, as class methods. Then, objects of that class won't give the same "no attribute" error.

    For example:

    class MeanEmbeddingVectorizer(object):
        def __init__(self, word2vec):
            self.word2vec = word2vec
            # if a text is empty we should return a vector of zeros
            # with the same dimensionality as all the other vectors
            self.dim = len(next(iter(word2vec.values())))
        def fit(self, X, y):
            return self
        def transform(self, X):
            return np.array([
                np.mean([self.word2vec[w] for w in words if w in self.word2vec]
                        or [np.zeros(self.dim)], axis=0)
                for words in X
            ])